5.4 参数估计
5.4.1 最小二乘估计
5.4.1.1 一元场合
对于离差平方和\(Q(\beta_0,\beta_1)\),最小二乘法考虑寻找合适的\(\hat \beta_0\)与\(\hat \beta_1\)使得残差平方和\(Q(\hat \beta_0,\hat \beta_1)\)最小。
\[ \begin{align} Q(\beta_0,\beta_1)&=\sum^n_{i=1}(y_i-\beta_0-\beta_1x_i)^2=\sum^n_{i=1}\varepsilon^2\\ \Rightarrow Q(\hat \beta_0,\hat \beta_1)&=\underset{\hat \beta_0, \hat \beta_1} {\arg\min} \sum^n_{i=1}(y_i-\hat \beta_0-\hat \beta_1x_i)^2\\ &=\underset{\hat \beta_0, \hat \beta_1} {\arg\min} \sum^n_{i=1}(y_i-\hat y_i)^2\\ &=\underset{\hat \beta_0, \hat \beta_1} {\arg\min} \sum^n_{i=1}e_i^2 \end{align} \tag{5.10} \]
分别对\(\beta_0\)和\(\beta_1\)求偏导,并使其为0:
\[ \left\{ \begin{array}{ll} \frac{\partial Q}{\partial \beta_0} \mid _{\beta_0=\hat \beta_0, \beta_1=\hat \beta_1} &= -2 \sum\limits_{i=1}^n (y_i-\hat \beta_0-\hat \beta_1x_i)=0 \\ \frac{\partial Q}{\partial \beta_1} \mid _{\beta_0=\hat \beta_0, \beta_1=\hat \beta_1} &= -2 \sum\limits_{i=1}^n (y_i-\hat \beta_0-\hat \beta_1x_i)x_i=0 \end{array} \right. \tag{5.11} \]
式(5.11)等价于:
\[ \left\{ \begin{array}{ll} \sum\limits_{i=1}^n e_i=0 \\ \sum\limits_{i=1}^n x_ie_i=0 \end{array} \right. \tag{5.12} \]
这个关系式非常重要
将求和式展开,并稍加整理,可得:
\[ \left\{ \begin{array}{ll} \bar y=\hat \beta_0+\hat \beta_1 \bar x \\ \sum\limits_{i=1}^n y_ix_i=n\bar x \hat \beta_0+\hat \beta_1\sum\limits_{i=1}^n x_i^2 \end{array} \right. \tag{5.13} \]
易得\(\hat \beta_0=\bar y-\hat \beta_1 \bar x\),将其代入可得\(\hat \beta_1\):
\[ \begin{align} n\bar x \hat \beta_0+\hat \beta_1\sum_{i=1}^n x_i^2 &= \sum_{i=1}^n y_ix_i\\ n\bar x (\bar y-\hat \beta_1 \bar x) + \hat \beta_1\sum_{i=1}^n x_i^2 &= \sum_{i=1}^n y_ix_i\\ \hat \beta_1(\sum_{i=1}^n x_i^2 - n \bar x^2) &= \sum_{i=1}^n y_ix_i - n\bar x \bar y\\ \hat \beta_1 &= \frac{\sum\limits_{i=1}^n(x_i-\bar x)y_i}{\sum\limits_{i=1}^n(x_i-\bar x)^2}\\ \hat \beta_1 &= \frac{\sum\limits_{i=1}^n(x_i-\bar x)(y_i-\bar y)}{\sum\limits_{i=1}^n(x_i-\bar x)^2} \end{align} \tag{5.14} \]
注意有以下关系,后面还会用到
\(\sum\limits_{i=1}^n(x_i-\bar x)=0\)
\(\sum\limits_{i=1}^n(x_i-\bar x)x_i=\sum\limits_{i=1}^n(x_i-\bar x)(x_i-\bar x)=\sum\limits_{i=1}^n(x_i-\bar x)^2\)
\(\sum\limits_{i=1}^n(x_i-\bar x)(y_i-\bar y)\)=\(\sum\limits_{i=1}^n(x_i-\bar x)y_i\)
记\(L_{xx}=\sum\limits_{i=1}^n(x_i-\bar x)^2\)、\(L_{yy}=\sum\limits_{i=1}^n(y_i-\bar y)^2\)、\(L_{xy}=\sum\limits_{i=1}^n(x_i-\bar x)(y_i-\bar y)\),则最小二乘估计为:
\[ \left\{ \begin{array}{ll} \hat \beta_0=\bar y - \hat \beta_1 \bar x \\ \hat \beta_1=\frac{L_{xy}}{L_{xx}} \end{array} \right. \tag{5.15} \]
而对于\(\sigma^2\),很自然地会用残差\(e\)进行估计,常用无偏估计量\(\hat \sigma^2\)进行估计:
\[ \hat \sigma^2=\frac{\sum\limits_{i=1}^n e_i^2}{n-2} \tag{5.16} \]
在多元回归部分会给出更为详细的证明。
5.4.1.2 多元场合
对于离差平方和\(Q(\beta)\)有
\[ \begin{aligned} Q(\beta) &= (Y-X \beta)'(Y-X \beta)\\ &=Y'Y- \beta'X'Y-Y'X \beta+\beta'X'X \beta\\ &=Y'Y-2\beta'X'Y+\beta'X'X \beta \end{aligned} \tag{5.17} \]
注意\(\beta'X'Y\)和\(Y'X \beta\)都是标量。
对\(\beta\)求偏导可得:
\[ {\partial Q \over \partial \beta} \mid_{\beta=\hat \beta}=-2X'Y+2X'X\hat \beta=0\\ \begin{aligned} \Rightarrow X'X\hat \beta &= X'Y\\ \hat \beta&=(X'X)^{-1}X'Y \end{aligned} \tag{5.18} \]
一阶导条件等价于\(X'(Y-X\hat \beta)=X'e=0\),其中\(e\)为残差,牢记\(X'e=0\)
为了确保有解,要求自变量之间无多重共线性,即矩阵X列满秩,故矩阵X’X可逆。
故最小二乘估计为\(\hat \beta=(X'X)^{-1}Y\)。
对于拟合值\(\hat Y\),有:
\[ \hat Y = X \hat \beta = X(X'X)^{-1}X'Y=HY \tag{5.19} \]
其中\(H=X(X'X)^{-1}X'\)为n阶对称幂等矩阵,即\(H=H'\)和\(H=H^2\)。\(H\)也称为投影矩阵。
实对称矩阵的特征根非0即1,故tr(H)=rank(H)=p+1
则残差向量\(e\)为:
\[ \begin{aligned} e &= Y-\hat Y\\ &=Y-HY\\ &=(I-H)Y\\ &=(I-H)(X\beta+\varepsilon)\\ &=X\beta-HX\beta+(I-H)\varepsilon\\ &=(I-H)\varepsilon \end{aligned} \tag{5.20} \]
之后对残差平方和\(SSE=e'e=\varepsilon'(I-H)\varepsilon\)取期望:
tr(AB)=tr(BA)
I和H为n阶矩阵;X为p+1阶矩阵
\[ \begin{aligned} E(SSE)&=E(\varepsilon'(I-H)\varepsilon)\\ &=E[tr(\varepsilon'(I-H)\varepsilon)]\\ &=E[tr((I-H)\varepsilon\varepsilon')]\\ &=tr((I-H)E(\varepsilon\varepsilon'))\\ &=\sigma^2 tr(I-H)\\ &=\sigma^2 [n-tr(H)]\\ &=\sigma^2 [n-tr(X(X'X)^{-1}X')]\\ &=\sigma^2 [n-tr((X'X)^{-1}X'X)]\\ &=\sigma^2 [n-p-1]\\ \end{aligned} \tag{5.21} \]
故\(\sigma^2\)的无偏估计为\(\hat \sigma^2={SSE \over n-p-1}\)
5.4.2 极大似然估计
5.4.2.1 一元场合
在\(y\sim~N(\beta_0+\beta_1x,\sigma^2)\)的假定下,写出对数似然函数:
\[ \ln (L)=-{n \over 2} \ln (2\pi \sigma^2) - {1 \over 2\sigma^2} \sum_{i=1}^n [y_i-(\beta_0+\beta_1x_i)]^2 \tag{5.22} \]
分别对\(\beta_0\)、\(\beta_1\)、\(\sigma^2\)求偏导,可得对应的估计量。其中\(\beta_0\)、\(\beta_1\)与最小二乘估计的结果一致,但\(\sigma^2\)的估计量为\(\hat \sigma^2={\sum\limits_{i=1}^n e_i^2 \over n}\),是有偏估计量。
5.4.2.2 多元场合
注意到有\(Y \sim N(X\beta, \sigma^2I_n)\)。故对数似然函数为:
\[ \ln L=-{n \over 2}\ln (2\pi)-{n \over 2}\ln (\sigma^2)-{1 \over 2\sigma^2}(Y-X\beta)'(Y-X\beta) \tag{5.23} \]
要使对数似然函数取得最大值,则需最小化\((Y-X\beta)'(Y-X\beta)\),与式(5.17)一致,故\(\hat \beta_{MLE}\)结果与最小二乘估计一致。而\(\sigma^2\)的估计量为\(\hat \sigma^2={(Y-X\beta)'(Y-X\beta) \over n}\),同一元场合。
5.4.3 矩估计
5.4.3.1 一元场合
在前提假定中规定了\(E(\varepsilon)=0\)及\(Cov(X_i,\varepsilon)=E(X_i\varepsilon)=0\),注意到残差\(e\)是对\(\varepsilon\)的估计,则用样本矩估计总体矩有:
\[ \left\{ \begin{array}{ll} {1 \over n} \sum\limits_{i=1}^n (y_i-\hat \beta_0-\hat \beta_1 x_i)=0 \\ {1 \over n} \sum\limits_{i=1}^n (y_i-\hat \beta_0-\hat \beta_1 x_i)x_i=0 \end{array} \right. \tag{5.24} \]
与式(5.8)一致,则估计结果与最小二乘估计相同。
5.4.3.2 多元场合
在多元场合,注意到前提假定\(E(\varepsilon)=0\)和\(Cov(X_i,\varepsilon)=E(X_i\varepsilon)=0\),对应的样本矩条件为:
\[ {1 \over n}X'(Y-X\hat \beta)=0\\ \Rightarrow \hat \beta =(X'X)^{-1}X'Y \tag{5.25} \]
可得矩估计的结果和最小二乘估计相同。
无论一元还是多元,最小二乘估计、极大似然估计和矩估计都用到了零均值、无内生性、无多重共线性(多元场合)的前提假定,其中极大似然估计额外运用了正态分布的假定。可以发现,估计的核心都是\(X'(Y-X\hat \beta)=0\),或者说是\(X'e=0\)。
注意X’的第一行都是1,用来满足\(E(\varepsilon)=0\)的条件。其余行为不同自变量的观测值,用来满足\(Cov(X_i,\varepsilon)=E(X_i\varepsilon)=0\)的条件。