5.5 最小二乘估计的性质
根据高斯-马尔科夫定理,在满足假定的前提下,最小二乘估计为最优线性无偏估计(best linear unbiased estimator,BLUE)。再探讨性质之前,请先回忆一元及多元场合的最小二乘估计值,式(5.15)和式(5.18)。
5.5.1 线性
5.5.1.1 一元场合
式(5.15)给出了最小二乘估计,对其稍加整理即可发现\(\beta\)是\(y\)的线性组合。
\[ \begin{aligned} \hat{\beta_1} &= {L_{xy} \over L_{xx}} \\ &= {\sum\limits_{i=1}^n (x_i-\bar x)(y_i - \bar y) \over L_{xx}} \\ &= {\sum\limits_{i=1}^n (x_i-\bar x)y_i \over L_{xx}} \\ &= \sum_{i=1}^n {(x_i-\bar x) \over L_{xx}} y_i \\ &= \sum_{i=1}^n k_i y_i \end{aligned} \tag{5.26} \]
\(L_{xx}\)相当于一个常数,可以放进去或提出来。
\[ \begin{aligned} \hat \beta_0 &= \bar y - \hat \beta_1 \bar x \\ &= \sum\limits_{i=1}^n {1 \over n}y_i - \sum\limits_{i=1}^n {(x_i-\bar x)\bar x \over L_{xx}} y_i \\ &= \sum\limits_{i=1}^n [{1 \over n} - {(x_i-\bar x)\bar x \over L_{xx}}]y_i \end{aligned} \tag{5.27} \]
对于拟合值\(\hat y_i\)有
\[ \hat y_i=\hat \beta_0+\hat \beta_1x_i=\sum_{j=1}^n [\frac{1}{n}+\frac{(x_i-\bar x)(x_j-\bar x)}{L_{xx}}]y_j=\sum_{j=1}^n h_{ij}y_j \tag{5.28} \]
\(h_{ij}=h_{ji}\)
对于给定的新值\(x_0\),对应的预测值\(\hat y_0\)有
\[ \hat y_0=\hat \beta_0+\hat \beta_1x_0=\sum_{j=1}^n [\frac{1}{n}+\frac{(x_0-\bar x)(x_j-\bar x)}{L_{xx}}]y_j=\sum_{j=1}^n h_{0j}y_j \tag{5.29} \]
5.5.1.2 多元场合
对于最小二乘估计\(\hat \beta\)有
\[ \begin{aligned} \hat \beta &= (X'X)^{-1}X'Y \\ &= (X'X)^{-1}X'(X\beta+\varepsilon) \\ &= \beta+(X'X)^{-1}X'\varepsilon \end{aligned} \tag{5.30} \]
注意到\(\hat \beta\)不仅是\(y\)的线性组合,还是\(\varepsilon\)的线性组合。
对于拟合值向量\(\hat Y\),参照式(5.19)有\(\hat Y = HY\)。
对于预测值\(\hat y_0\),有
\[ \hat y_0 = x_0'\hat \beta=x_0'(X'X)^{-1}X'Y \tag{5.31} \]
对于残差向量\(e\),参照式(5.20),有\(e=(I-H)Y=(I-H)\varepsilon\)。
5.5.2 无偏性
5.5.2.1 一元场合
\[ \begin{aligned} E(\hat \beta_1)&= \sum_{i=1}^n {(x_i-\bar x) \over L_{xx}} E(y_i) \\ &= \sum_{i=1}^n {(x_i-\bar x) \over L_{xx}}(\beta_0+\beta_1x_i) \\ &= \sum_{i=1}^n {(x_i-\bar x)x_i \over L_{xx}}\beta_1 \\ &= {L_{xx} \over L_{xx}}\beta_1 \\ &= \beta_1 \end{aligned} \tag{5.32} \]
\[ \begin{aligned} E(\hat \beta_0)&=E(\bar y)-E(\hat \beta_1)\bar x \\ &= {\sum\limits_{i=1}^n E(y_i) \over n}-\beta_1 \bar x \\ &= {\sum\limits_{i=1}^n (\beta_0 + \beta_1 x) \over n}-\beta_1 \bar x \\ &= \beta_0 + \beta_1 \bar x -\beta_1 \bar x \\ &= \beta_0 \end{aligned} \tag{5.33} \]
5.5.3 有效性
5.5.3.1 一元场合
不妨令\(\tilde \beta_1=\sum\limits_{i=1}^n c_iy_i\)也是\(\beta_1\)的无偏估计。
\[ \begin{aligned} E(\tilde \beta_1)&=E(\sum\limits_{i=1}^n c_iy_i) \\ &= \sum\limits_{i=1}^n c_iE(y_i) \\ &= \sum\limits_{i=1}^n c_i(\beta_0+\beta_1x_i) \\ &= \beta_0 \sum\limits_{i=1}^n c_i + \beta_1 \sum\limits_{i=1}^n c_ix_i \\ &= \beta_1 \end{aligned} \tag{5.35} \]
根据无偏性,可知\(\tilde \beta_1\)满足\(\sum\limits_{i=1}^n c_i=0\)和\(\sum\limits_{i=1}^n c_ix_i=1\)。
\[ \begin{aligned} Var(\tilde \beta_1)&=\sum\limits_{i=1}^n c_i^2Var(y_i) \\ &= \sum\limits_{i=1}^n c_i^2 \sigma^2 \\ &= \sum\limits_{i=1}^n (c_i-k_i+k_i)^2 \sigma^2 \\ &= \sigma^2[\sum\limits_{i=1}^n(c_i-k_i)^2+\sum\limits_{i=1}^nk_i^2+\sum\limits_{i=1}^n2(c_i-k_i)k_i] \\ &= \sum\limits_{i=1}^n(c_i-k_i)^2 \sigma^2 + Var(\hat \beta_1) \\ Var(\tilde \beta_1) &\geq Var(\hat \beta_1) \end{aligned} \tag{5.36} \]
注意到\(\sum\limits_{i=1}^n c_i=0\)和\(\sum\limits_{i=1}^n c_ix_i=1\),其中
\[ \begin{aligned} \sum\limits_{i=1}^n2(c_i-k_i)k_i &= 2\sum\limits_{i=1}^n c_ik_i-2\sum\limits_{i=1}^n k_i^2 \\ &=2\sum\limits_{i=1}^n c_i{(x_i-\bar x) \over \sum\limits_{i=1}^n (x_i-\bar x)^2}-2\sum\limits_{i=1}^n ({(x_i-\bar x) \over \sum\limits_{i=1}^n (x_i-\bar x)^2})^2 \\ &= 2{1 \over \sum\limits_{i=1}^n (x_i-\bar x)^2}-2{1 \over \sum\limits_{i=1}^n (x_i-\bar x)^2} \\ &= 0 \end{aligned} \tag{5.37} \]
关于\(\hat \beta_0\)的证明也是类似的,略。
5.5.3.2 多元场合
不妨令\(\tilde \beta=[(X'X)^{-1}X'+A]Y\)是\(\beta\)的无偏估计。
\[ \begin{aligned} E(\tilde \beta)&=[(X'X)^{-1}X'+A]E(Y) \\ &= [(X'X)^{-1}X'+A]X\beta \\ &= \beta +AX\beta \\ &= \beta \end{aligned} \tag{5.38} \]
根据无偏性,可知\(AX=0\)。
\[ \begin{aligned} Cov(\hat \beta)&=[(X'X)^{-1}X']Cov(Y)[(X'X)^{-1}X']' \\ &= [(X'X)^{-1}X']\sigma^2I_n[(X'X)^{-1}X']' \\ &= \sigma^2 (X'X)^{-1} \\ Cov(\tilde \beta)&=[(X'X)^{-1}X'+A]Cov(Y)[(X'X)^{-1}X'+A]' \\ &= [(X'X)^{-1}X'+A]\sigma^2I_n[(X'X)^{-1}X'+A]' \\ &= \sigma^2 [(X'X)^{-1}+AA'] \end{aligned} \tag{5.39} \]
注意到\(AX=0\)
由于\(AA'\)是半正定矩阵(非负定),则\(Cov(\tilde \beta) \geq Cov(\hat \beta)\)。
5.5.4 方差
5.5.4.1 一元场合
\(Var(\hat{\beta_1})\)
\[ \begin{aligned} Var(\hat \beta_1)&=Var(\sum_{i=1}^n {(x_i-\bar x) \over L_{xx}} y_i) \\ &= \sum_{i=1}^n {(x_i-\bar x)^2 \over L_{xx}^2}\sigma^2 \\ &= {L_{xx} \over L_{xx}^2}\sigma^2 \\ &= {\sigma^2 \over L_{xx}} \end{aligned} \tag{5.40} \]
\(Var(\hat{\beta_0})\)
\[ \begin{aligned} Var(\hat \beta_0)&=Var(\bar y - \hat \beta_1 \bar x) \\ &= Var(\bar y)+\bar x^2Var(\hat \beta)-2Cov(\bar y,\hat \beta_1 \bar x) \\ &= {\sigma^2 \over n}+\bar x^2{\sigma^2 \over L_{xx}} \\ &= [{1 \over n}+{\bar x^2 \over L_{xx}}]\sigma^2 \end{aligned} \tag{5.41} \]
其中
\[ \begin{aligned} Cov(\bar y,\hat \beta_1 \bar x)&={\bar x^2 \over n}Cov(\sum_{i=1}^n y_i,\sum_{i=1}^n k_iy_i) \\ &= {\bar x^2 \over n} \sum_{i=1}^n k_iCov(y_i,y_i) \\ &= {\bar x^2 \sigma^2 \over n} \sum_{i=1}^n k_i \\ &= {\bar x^2 \sigma^2 \over n} \sum_{i=1}^n {x_i-\bar x \over L_{xx}} \\ &= 0 \end{aligned} \tag{5.42} \]
注意有\(Cov(\varepsilon_i,\varepsilon_j)=0\),则\(Cov(y_i,y_j)=0\)
\(Cov(\hat \beta_0,\hat \beta_1)\)
\[ \begin{aligned} Cov(\hat \beta_0,\hat \beta_1) &= Cov(\bar y - \hat \beta_1 \bar x,\hat \beta_1) \\ &= Cov(\bar y,\hat \beta_1)-\bar xCov(\hat \beta_1,\hat \beta_1) \\ &= 0-\bar x {\sigma^2 \over L_{xx}} \\ &= -{\bar x \over L_{xx}}\sigma^2 \end{aligned} \tag{5.43} \]
5.5.4.2 多元场合
\[ \begin{aligned} Cov(\hat \beta) &= E[(\hat \beta-E(\hat \beta))(\hat \beta-E(\hat \beta))'] \\ &= E[(\hat \beta-\beta)(\hat \beta-\beta)'] \\ &= E[(X'X)^{-1}X'\varepsilon \varepsilon ' X(X'X)^{-1}] \\ &= (X'X)^{-1}X'E(\varepsilon \varepsilon ') X(X'X)^{-1} \\ &= (X'X)^{-1}X'\sigma^2I_n X(X'X)^{-1} \\ &= \sigma^2(X'X)^{-1} \end{aligned} \tag{5.44} \]
5.5.5 正态性
5.5.5.1 一元场合
根据式(5.26)和式(5.27)、式(5.32)和式(5.33)、式(5.40)和式(5.41)可知,\(\hat \beta_1\)和\(\hat \beta_0\)服从正态分布。
\[ \begin{gather} \hat \beta_1 \sim N(\beta_1, \frac{\sigma^2}{L_{xx}}) \\ \hat \beta_0 \sim N(\beta_0, [\frac{1}{n}+\frac{\bar x^2}{L_{xx}}]\sigma^2) \end{gather} \tag{5.45} \]
\(\hat \beta_1\)的正态性源自y的正态性,而y的正态性又源自\(\varepsilon\)的正态性
因此,凡是能被\(y\)线性表示的都具有正态性,详见多元场合
5.5.6 残差
5.5.6.1 一元场合
线性表示
根据式(5.28),对于残差\(e_i\)有
\[ e_i=y_i-\hat y_i=y_i-\sum_{j=1}^n h_{ij}y_j \tag{5.47} \]
\(E(e_i)\)
\[ E(e_i)=E(y_i-\hat y_i)=(\beta_0+\beta_1 x_i)-(\beta_0+\beta_1 x_i)=0 \tag{5.48} \]
\(Cov(e_i,e_j)\)
当\(i \neq j\)时:
\[ \begin{aligned} Cov(e_i,e_j)&=Cov(y_i-\sum\limits_{k=1}^nh_{ik}y_k \, , \, y_j-\sum\limits_{l=1}^nh_{jl}y_l) \\ &= -Cov(y_i \, , \, h_{ji}y_i)-Cov(y_j \, , \, h_{ij}y_j)+\sum\limits_{k=1}^n h_{ik}h_{jk}Cov(y_k \, , \, y_k) \\ &= -h_{ji}\sigma^2-h_{ij}\sigma^2+h_{ij}\sigma^2 \\ &= -h_{ij}\sigma^2 \end{aligned} \tag{5.49} \]
其中
\[ \begin{aligned} \sum\limits_{k=1}^n h_{ik}h_{jk}&=\sum\limits_{k=1}^n [{1 \over n^2} + {(x_k-\bar{x})(x_j-\bar{x}+x_i-\bar{x}) \over nL_{xx}}+{(x_i-\bar{x})(x_j-\bar{x})(x_k-\bar{x})^2 \over L_{xx}^2}] \\ &= {1 \over n} + {(x_i-\bar{x})(x_j-\bar{x}) \over L_{xx}} \\ &= h_{ij} \end{aligned} \tag{5.50} \]
当\(i = j\)时:
\[ \begin{aligned} Cov(e_i \, , \, e_i)&=Var(e_i) \\ &= Var(y_i-\sum\limits_{j=1}^nh_{ij}y_j) \\ &= Var(y_i)+Var(\sum\limits_{j=1}^nh_{ij}y_j)-2Cov(y_i\, , \, \sum\limits_{j=1}^nh_{ij}y_j) \\ &= \sigma^2 + \sigma^2 \sum\limits_{j=1}^n h_{ij}^2-2h_{ii}\sigma^2 \\ &= \sigma^2+h_{ii}\sigma^2-2h_{ii}\sigma^2 \\ &= (1-h_{ii})\sigma^2 \end{aligned} \tag{5.51} \]
其中
\[ \begin{aligned} \sum\limits_{j=1}^n h_{ij}^2 &= \sum\limits_{j=1}^n [{1 \over n^2}+{(x_i-\bar{x})^2(x_j-\bar{x})^2 \over L_{xx}^2}+{2(x_i-\bar{x})(x_j-\bar{x}) \over nL_{xx}}] \\ &= {1 \over n}+{(x_i-\bar{x})^2 \over L_{xx}} \\ &= h_{ii} \end{aligned} \tag{5.52} \]
故
\[ Cov(e_i\, , \, e_j)= \begin{cases} (1-h_{ii})\sigma^2, &i=j \\ -h_{ij}\sigma^2, &i \neq j \end{cases} \tag{5.53} \]
特别的,称\(h_{ii}=\frac{1}{n}+\frac{(x_i-\bar{x})^2}{L_{xx}}\)为杠杆值。
杠杆值度量了自变量空间中第\(i\)个观测点偏离样本中心的程度。当杠杆值越大时,对应的\(Var(e_i)\)越小,在几何上表现为较远的观测点会把回归线尽可能地拉到自身周边,从而降低自身的残差值。对应的观测点也称之为高杠杆点
\(Cov(e_i, \hat y_j)\)
对任意的\(i\)和\(j\),由式(5.28)和式(5.47)可得
\[ \begin{aligned} Cov(e_i, \hat y_j)&=Cov(y_i-\sum_{k=1}^n h_{ik}y_k, \sum_{k=1}^n h_{jk}y_k) \\ &=Cov(y_i,\sum_{k=1}^n h_{jk}y_k)-Cov(\sum_{k=1}^n h_{ik}y_k,\sum_{k=1}^n h_{jk}y_k) \\ &= h_{ji}\sigma^2-\sigma^2\sum_{k=1}^n h_{ik}h_{jk} \\ &= h_{ji}\sigma^2-\sigma^2 h_{ij} \\ &= 0 \end{aligned} \tag{5.54} \]
\(Cov(e_i, \hat y_0)\)
由式(5.29)可得
\[ \begin{aligned} Cov(e_i,\hat y_0)&= Cov(y_i-\sum_{j=1}^n h_{ij}y_j, \sum_{j=1}^n h_{0j}y_j) \\ &= Cov(y_i, \sum_{j=1}^n h_{0j}y_j)-Cov(\sum_{j=1}^n h_{ij}y_j,\sum_{j=1}^n h_{0j}y_j) \\ &= h_{0i}\sigma^2-\sigma^2 \sum_{j=1}^n h_{ij}h_{0j} \\ &= h_{0i}\sigma^2-\sigma^2 h_{i0} \\ &= 0 \end{aligned} \tag{5.55} \]
\(Cov(e_i, \hat \beta_0)\)
由式(5.27)可得
\[ \begin{aligned} Cov(e_i, \hat \beta_0)&=Cov(y_i-\sum_{j=1}^n h_{ij}y_j, \sum_{j=1}^n [{1 \over n} - {(x_j-\bar x)\bar x \over L_{xx}}]y_j) \\ &= Cov(y_i, \sum_{j=1}^n [{1 \over n} - {(x_j-\bar x)\bar x \over L_{xx}}]y_j)-Cov(\sum_{j=1}^n h_{ij}y_j,\sum_{j=1}^n [{1 \over n} - {(x_j-\bar x)\bar x \over L_{xx}}]y_j) \\ &= [{1 \over n} - {(x_i-\bar x)\bar x \over L_{xx}}]\sigma^2 - \sigma^2 \sum_{j=1}^n h_{ij}({1 \over n} - {(x_j-\bar x)\bar x \over L_{xx}}) \\ &= [{1 \over n} - {(x_i-\bar x)\bar x \over L_{xx}}]\sigma^2-[{1 \over n} - {(x_i-\bar x)\bar x \over L_{xx}}]\sigma^2 \\ &= 0 \end{aligned} \tag{5.56} \]
\(Cov(e_i, \hat \beta_1)\)
由式(5.26)可得
\[ \begin{aligned} Cov(e_i, \hat \beta_1) &= Cov(y_i-\sum_{j=1}^n h_{ij}y_j, \sum_{j=1}^n \frac{(x_j-\bar x)}{L_{xx}}y_j) \\ &= Cov(y_i, \sum_{j=1}^n \frac{(x_j-\bar x)}{L_{xx}}y_j) - Cov(\sum_{j=1}^n h_{ij}y_j,\sum_{j=1}^n \frac{(x_j-\bar x)}{L_{xx}}y_j) \\ &= \frac{(x_i-\bar x)}{L_{xx}} \sigma^2 - \frac{(x_i-\bar x)}{L_{xx}} \sigma^2 \\ &= 0 \end{aligned} \tag{5.57} \]
\(\sum_{i=1}^n e_i = \sum_{i=1}^n x_ie_i=0\)
参见式(5.12)
5.5.6.2 多元场合
线性表示
式(5.20)给出了残差的表达式为\(e=Y-HY=(I-H)Y=(I-H)\varepsilon\)
\(E(e)\)
\[ E(e)=E[(I-H)\varepsilon]=0 \tag{5.58} \]
\(Cov(e)\)
\[ \begin{aligned} Cov(e) &= Cov((I-H)Y) \\ &= Cov((I-H)\varepsilon) \\ &= (I-H)E(\varepsilon\varepsilon')(I-H)' \\ &= \sigma^2 (I-H) \end{aligned} \tag{5.59} \]
\(Cov(e,\hat Y)\)
\[ \begin{aligned} Cov(e,\hat Y)&=Cov((I-H)Y,HY) \\ &=(I-H)Cov(Y)H' \\ &=(I-H)Cov(X\beta+\varepsilon)H \\ &=\sigma^2(I-H)H \\ &=\sigma^2(H-H^2) \\ &=0 \end{aligned} \tag{5.60} \]
\(Cov(e,\hat y_0)\)
\[ \begin{aligned} Cov(e,\hat y_0)&=Cov((I-H)Y,x_0'(X'X)^{-1}X'Y) \\ &=\sigma^2(I-H)X(X'X)^{-1}x_0 \\ &=\sigma^2(X(X'X)^{-1}-X(X'X)^{-1}X'X(X'X)^{-1})x_0 \\ &=0 \end{aligned} \tag{5.61} \]
\(Cov(e,\hat \beta)\)
\[ \begin{aligned} Cov(e,\hat \beta)&=Cov((I-H)Y,(X'X)^{-1}X'Y) \\ &= \sigma^2(I-H)X(X'X)^{-1} \\ &= \sigma^2[X(X'X)^{-1}-X(X'X)^{-1}X'X(X'X)^{-1}] \\ &= 0 \end{aligned} \tag{5.62} \]
由于\(e\)和\(\hat \beta\)都是\(Y\)的线性组合,因此都服从正态分布,故协方差为0表示\(e\)和\(\hat \beta\)之间独立,同样的也有SSE或\(\hat \sigma^2\)与\(\hat \beta\)独立
\(X'e=0\)
参见式(5.18)
在正规方程组中,我们得到的一阶导条件为\(X'e=0\)。从几何视角来看,残差向量\(e\)正交于X张成的列空间,因此凡是X列空间中的向量均与\(e\)不相关