5.4 参数估计

5.4.1 最小二乘估计

5.4.1.1 一元场合

对于离差平方和\(Q(\beta_0,\beta_1)\),最小二乘法考虑寻找合适的\(\hat \beta_0\)\(\hat \beta_1\)使得残差平方和\(Q(\hat \beta_0,\hat \beta_1)\)最小。

\[ \begin{align} Q(\beta_0,\beta_1)&=\sum^n_{i=1}(y_i-\beta_0-\beta_1x_i)^2=\sum^n_{i=1}\varepsilon^2\\ \Rightarrow Q(\hat \beta_0,\hat \beta_1)&=\underset{\hat \beta_0, \hat \beta_1} {\arg\min} \sum^n_{i=1}(y_i-\hat \beta_0-\hat \beta_1x_i)^2\\ &=\underset{\hat \beta_0, \hat \beta_1} {\arg\min} \sum^n_{i=1}(y_i-\hat y_i)^2\\ &=\underset{\hat \beta_0, \hat \beta_1} {\arg\min} \sum^n_{i=1}e_i^2 \end{align} \tag{5.10} \]

分别对\(\beta_0\)\(\beta_1\)求偏导,并使其为0:

\[ \left\{ \begin{array}{ll} \frac{\partial Q}{\partial \beta_0} \mid _{\beta_0=\hat \beta_0, \beta_1=\hat \beta_1} &= -2 \sum\limits_{i=1}^n (y_i-\hat \beta_0-\hat \beta_1x_i)=0 \\ \frac{\partial Q}{\partial \beta_1} \mid _{\beta_0=\hat \beta_0, \beta_1=\hat \beta_1} &= -2 \sum\limits_{i=1}^n (y_i-\hat \beta_0-\hat \beta_1x_i)x_i=0 \end{array} \right. \tag{5.11} \]

(5.11)等价于:

\[ \left\{ \begin{array}{ll} \sum\limits_{i=1}^n e_i=0 \\ \sum\limits_{i=1}^n x_ie_i=0 \end{array} \right. \tag{5.12} \]

这个关系式非常重要

将求和式展开,并稍加整理,可得:

\[ \left\{ \begin{array}{ll} \bar y=\hat \beta_0+\hat \beta_1 \bar x \\ \sum\limits_{i=1}^n y_ix_i=n\bar x \hat \beta_0+\hat \beta_1\sum\limits_{i=1}^n x_i^2 \end{array} \right. \tag{5.13} \]

易得\(\hat \beta_0=\bar y-\hat \beta_1 \bar x\),将其代入可得\(\hat \beta_1\)

\[ \begin{align} n\bar x \hat \beta_0+\hat \beta_1\sum_{i=1}^n x_i^2 &= \sum_{i=1}^n y_ix_i\\ n\bar x (\bar y-\hat \beta_1 \bar x) + \hat \beta_1\sum_{i=1}^n x_i^2 &= \sum_{i=1}^n y_ix_i\\ \hat \beta_1(\sum_{i=1}^n x_i^2 - n \bar x^2) &= \sum_{i=1}^n y_ix_i - n\bar x \bar y\\ \hat \beta_1 &= \frac{\sum\limits_{i=1}^n(x_i-\bar x)y_i}{\sum\limits_{i=1}^n(x_i-\bar x)^2}\\ \hat \beta_1 &= \frac{\sum\limits_{i=1}^n(x_i-\bar x)(y_i-\bar y)}{\sum\limits_{i=1}^n(x_i-\bar x)^2} \end{align} \tag{5.14} \]

注意有以下关系,后面还会用到

\(\sum\limits_{i=1}^n(x_i-\bar x)=0\)

\(\sum\limits_{i=1}^n(x_i-\bar x)x_i=\sum\limits_{i=1}^n(x_i-\bar x)(x_i-\bar x)=\sum\limits_{i=1}^n(x_i-\bar x)^2\)

\(\sum\limits_{i=1}^n(x_i-\bar x)(y_i-\bar y)\)=\(\sum\limits_{i=1}^n(x_i-\bar x)y_i\)

\(L_{xx}=\sum\limits_{i=1}^n(x_i-\bar x)^2\)\(L_{yy}=\sum\limits_{i=1}^n(y_i-\bar y)^2\)\(L_{xy}=\sum\limits_{i=1}^n(x_i-\bar x)(y_i-\bar y)\),则最小二乘估计为:

\[ \left\{ \begin{array}{ll} \hat \beta_0=\bar y - \hat \beta_1 \bar x \\ \hat \beta_1=\frac{L_{xy}}{L_{xx}} \end{array} \right. \tag{5.15} \]

而对于\(\sigma^2\),很自然地会用残差\(e\)进行估计,常用无偏估计量\(\hat \sigma^2\)进行估计:

\[ \hat \sigma^2=\frac{\sum\limits_{i=1}^n e_i^2}{n-2} \tag{5.16} \]

在多元回归部分会给出更为详细的证明。

5.4.1.2 多元场合

对于离差平方和\(Q(\beta)\)

\[ \begin{aligned} Q(\beta) &= (Y-X \beta)'(Y-X \beta)\\ &=Y'Y- \beta'X'Y-Y'X \beta+\beta'X'X \beta\\ &=Y'Y-2\beta'X'Y+\beta'X'X \beta \end{aligned} \tag{5.17} \]

注意\(\beta'X'Y\)\(Y'X \beta\)都是标量。

\(\beta\)求偏导可得:

\[ {\partial Q \over \partial \beta} \mid_{\beta=\hat \beta}=-2X'Y+2X'X\hat \beta=0\\ \begin{aligned} \Rightarrow X'X\hat \beta &= X'Y\\ \hat \beta&=(X'X)^{-1}X'Y \end{aligned} \tag{5.18} \]

一阶导条件等价于\(X'(Y-X\hat \beta)=X'e=0\),其中\(e\)为残差,牢记\(X'e=0\)

为了确保有解,要求自变量之间无多重共线性,即矩阵X列满秩,故矩阵X’X可逆。

故最小二乘估计为\(\hat \beta=(X'X)^{-1}Y\)

对于拟合值\(\hat Y\),有:

\[ \hat Y = X \hat \beta = X(X'X)^{-1}X'Y=HY \tag{5.19} \]

其中\(H=X(X'X)^{-1}X'\)为n阶对称幂等矩阵,即\(H=H'\)\(H=H^2\)\(H\)也称为投影矩阵

实对称矩阵的特征根非0即1,故tr(H)=rank(H)=p+1

则残差向量\(e\)为:

\[ \begin{aligned} e &= Y-\hat Y\\ &=Y-HY\\ &=(I-H)Y\\ &=(I-H)(X\beta+\varepsilon)\\ &=X\beta-HX\beta+(I-H)\varepsilon\\ &=(I-H)\varepsilon \end{aligned} \tag{5.20} \]

之后对残差平方和\(SSE=e'e=\varepsilon'(I-H)\varepsilon\)取期望:

tr(AB)=tr(BA)

I和H为n阶矩阵;X为p+1阶矩阵

\[ \begin{aligned} E(SSE)&=E(\varepsilon'(I-H)\varepsilon)\\ &=E[tr(\varepsilon'(I-H)\varepsilon)]\\ &=E[tr((I-H)\varepsilon\varepsilon')]\\ &=tr((I-H)E(\varepsilon\varepsilon'))\\ &=\sigma^2 tr(I-H)\\ &=\sigma^2 [n-tr(H)]\\ &=\sigma^2 [n-tr(X(X'X)^{-1}X')]\\ &=\sigma^2 [n-tr((X'X)^{-1}X'X)]\\ &=\sigma^2 [n-p-1]\\ \end{aligned} \tag{5.21} \]

\(\sigma^2\)的无偏估计为\(\hat \sigma^2={SSE \over n-p-1}\)

5.4.2 极大似然估计

5.4.2.1 一元场合

\(y\sim~N(\beta_0+\beta_1x,\sigma^2)\)的假定下,写出对数似然函数:

\[ \ln (L)=-{n \over 2} \ln (2\pi \sigma^2) - {1 \over 2\sigma^2} \sum_{i=1}^n [y_i-(\beta_0+\beta_1x_i)]^2 \tag{5.22} \]

分别对\(\beta_0\)\(\beta_1\)\(\sigma^2\)求偏导,可得对应的估计量。其中\(\beta_0\)\(\beta_1\)与最小二乘估计的结果一致,但\(\sigma^2\)的估计量为\(\hat \sigma^2={\sum\limits_{i=1}^n e_i^2 \over n}\),是有偏估计量。

5.4.2.2 多元场合

注意到有\(Y \sim N(X\beta, \sigma^2I_n)\)。故对数似然函数为:

\[ \ln L=-{n \over 2}\ln (2\pi)-{n \over 2}\ln (\sigma^2)-{1 \over 2\sigma^2}(Y-X\beta)'(Y-X\beta) \tag{5.23} \]

要使对数似然函数取得最大值,则需最小化\((Y-X\beta)'(Y-X\beta)\),与式(5.17)一致,故\(\hat \beta_{MLE}\)结果与最小二乘估计一致。而\(\sigma^2\)的估计量为\(\hat \sigma^2={(Y-X\beta)'(Y-X\beta) \over n}\),同一元场合。

5.4.3 矩估计

5.4.3.1 一元场合

在前提假定中规定了\(E(\varepsilon)=0\)\(Cov(X_i,\varepsilon)=E(X_i\varepsilon)=0\),注意到残差\(e\)是对\(\varepsilon\)的估计,则用样本矩估计总体矩有:

\[ \left\{ \begin{array}{ll} {1 \over n} \sum\limits_{i=1}^n (y_i-\hat \beta_0-\hat \beta_1 x_i)=0 \\ {1 \over n} \sum\limits_{i=1}^n (y_i-\hat \beta_0-\hat \beta_1 x_i)x_i=0 \end{array} \right. \tag{5.24} \]

与式(5.8)一致,则估计结果与最小二乘估计相同。

5.4.3.2 多元场合

在多元场合,注意到前提假定\(E(\varepsilon)=0\)\(Cov(X_i,\varepsilon)=E(X_i\varepsilon)=0\),对应的样本矩条件为:

\[ {1 \over n}X'(Y-X\hat \beta)=0\\ \Rightarrow \hat \beta =(X'X)^{-1}X'Y \tag{5.25} \]

可得矩估计的结果和最小二乘估计相同。

无论一元还是多元,最小二乘估计、极大似然估计和矩估计都用到了零均值、无内生性、无多重共线性(多元场合)的前提假定,其中极大似然估计额外运用了正态分布的假定。可以发现,估计的核心都是\(X'(Y-X\hat \beta)=0\),或者说是\(X'e=0\)

注意X’的第一行都是1,用来满足\(E(\varepsilon)=0\)的条件。其余行为不同自变量的观测值,用来满足\(Cov(X_i,\varepsilon)=E(X_i\varepsilon)=0\)的条件。