Tài liệu Advanced Econometrics - Part I - Chapter 1: Classical Linear Regression: Advanced Econometrics Part I: Basic Econometric Models
Chapter 1: Classical Linear Regression
Nam T. Hoang
University of New England - Australia 1 University of Economics - HCMC - Vietnam
Chapter 1:
CLASSICAL LINEAR REGRESSION
I. MODEL:
Population model: ε+= ),...,,( 21 kXXXfY
- f may be any kind (linear, non-linear, parametric, non-parametric, ...)
- We'll focus on: parametric and linear in the parameters.
Sample information:
- We have a sample: { niikii XXXY 12 },...,,, =
- Assume that these observed values are generated by the population model:
ikikiii XXXY εββββ +++++= ...33221
- Objectives:
i. Estimate unknown parameters.
ii. Test hypotheses about parameters.
iii. Predict values of y outside sample.
- Note that:
ki
i
k X
Y
∂
∂
=β , so the parameters are the marginal effect of the X's on Y,
with other factors held constant.
EX: iii YC εββ ++= 21
Dependent
variable
Explanatory variable
or Regressor
Disturbance
(error)
...
15 trang |
Chia sẻ: honghanh66 | Lượt xem: 876 | Lượt tải: 0
Bạn đang xem nội dung tài liệu Advanced Econometrics - Part I - Chapter 1: Classical Linear Regression, để tải tài liệu về máy bạn click vào nút DOWNLOAD ở trên
Advanced Econometrics Part I: Basic Econometric Models
Chapter 1: Classical Linear Regression
Nam T. Hoang
University of New England - Australia 1 University of Economics - HCMC - Vietnam
Chapter 1:
CLASSICAL LINEAR REGRESSION
I. MODEL:
Population model: ε+= ),...,,( 21 kXXXfY
- f may be any kind (linear, non-linear, parametric, non-parametric, ...)
- We'll focus on: parametric and linear in the parameters.
Sample information:
- We have a sample: { niikii XXXY 12 },...,,, =
- Assume that these observed values are generated by the population model:
ikikiii XXXY εββββ +++++= ...33221
- Objectives:
i. Estimate unknown parameters.
ii. Test hypotheses about parameters.
iii. Predict values of y outside sample.
- Note that:
ki
i
k X
Y
∂
∂
=β , so the parameters are the marginal effect of the X's on Y,
with other factors held constant.
EX: iii YC εββ ++= 21
Dependent
variable
Explanatory variable
or Regressor
Disturbance
(error)
Advanced Econometrics Part I: Basic Econometric Models
Chapter 1: Classical Linear Regression
Nam T. Hoang
University of New England - Australia 2 University of Economics - HCMC - Vietnam
CPM
Y
C
i
i
k ..=∂
∂
=β ⇒ require 0 ≤ β ≤ 1
Denotes:
Y =
nY
Y
Y
2
1
; X =
nknn
k
k
XXX
XXX
XXX
...1
...1
...1
32
22322
11312
; β =
nβ
β
β
2
1
and ε =
nε
ε
ε
2
1
⇒ We have:
)1()1()()1( ××××
+=
nkknn
XY εβ
II. ASSUMPTIONS OF THE CLASSICAL REGRESSION MODEL:
Models are simplications of reality.
We'll make a set of simplifying assumptions for the model.
The assumptions relate to:
- Functional form.
- Regressors.
- Disturbances.
Assumption 1: Linearity. The model is linear in the parameters.
Y = X.β + ε
Assumption 2: Full rank. Xijs are not random variables - or Xijs are random
variables that are uncorrelated with ε.
There is no exact linear dependencies among the columns of X.
This assumption will be necessary for estimation of the parameters
(need no EXACT).
)(
1
kn
X
×
; Rank(X) = k implies n > k
as Rank (A) ≤ min (Rows, Columns)
Rank(X) = n is also OK.
Advanced Econometrics Part I: Basic Econometric Models
Chapter 1: Classical Linear Regression
Nam T. Hoang
University of New England - Australia 3 University of Economics - HCMC - Vietnam
Assumption 3: Exogeneity of the independent variables.
0],...,,[ 321 =jkjjji XXXXE ε i = j also i ≠j
This means that the independent variables will not carry useful
information for prediction of εi
0)(,10][ =→=∀= ii EniXE εε
Assumption 4:
≠∀=
==
jiCov
niVar
ji
i
0),(
,1)( 2
εε
σε
For any random vector Z =
nz
z
z
2
1
, we can express its variance -
covariance matrix as:
=−−= ]))'())(([()( ZEZZEZEZVarCov
−−−
−
−
−
=
×
×
m
mm
m
mm
zEzzEzzEz
zEz
zEz
zEz
E
1
2211
1
22
11
))(())(())(([
)(
)(
)(
jth diagonal element is var(zj) = σjj = σj2
ijth element (i ≠ j) is cov(zi,zj) = σij
=
−
mnmm
m
ijzEzE
σσσ
σσσ
σσ
21
22221
12
2
11 ]))([(
So we have "covariance matrix" for the vector ε
)'(])')()()([()(
00
εεεεεεε EEEEVarCov =−−=
Then the assumption (4) is equivalent:
Advanced Econometrics Part I: Basic Econometric Models
Chapter 1: Classical Linear Regression
Nam T. Hoang
University of New England - Australia 4 University of Economics - HCMC - Vietnam
==
2
2
2
2
00
00
00
)'(
σ
σ
σ
σεε
IE
⇔
≠∀=
==
)(0),(
)(hom,1)( 2
ationautocorrelnojiCov
ityoscedasticniVar
ji
i
εε
σε
Assumption 5: Data generating process for the regressors. (Non-stochastic of X).
+ Xijs are not random variables.
Notes: This assumption is different with assumption 3.
0][ =XE iε tell about the mean only (has to be 0).
Assumption 6: Normality of Errors.
],0[~ 2 IN σε
+ Normality is not necessary to obtain many results in the regression model.
+ It will be possible to relax this assumption and retain most of the statistic
results.
SUMMARY: The classical linear regression model is:
Y = X.β + ε
],0[~ 2 IN σε
Rank(X) = k
X is non-stochastic
III. LEAST SQUARES ESTIMATION:
(Ordinary Least Squares Estimation - OLS)
Our first task is to estimate the parameters of the model:
Y = X.β + ε with ],0[~ 2 IN σε
Advanced Econometrics Part I: Basic Econometric Models
Chapter 1: Classical Linear Regression
Nam T. Hoang
University of New England - Australia 5 University of Economics - HCMC - Vietnam
Many possible procedures for doing this. The choice should be based on "sampling
properties" of estimates.
Let's consider one possible estimation strategy: Least Squares.
Denote βˆ is estimator of β:
True relation: Y = X.β + ε
Estimated Relation: Y = X. βˆ + e while:
e =
ne
e
e
2
1
is estimated residuals (of ε) or ei is estimated of εi
For the ith observation:
)( population
unobserved
iii XY εβ +′= =
)(
' ˆ
sample
observed
ii eX +β
Sum of square residuals: ∑
=
=′
n
i
ieee
1
2
eee
n
i
i ′=∑
=1
2 = )ˆ()ˆ( ββ XYXY −′−
= ββββ ˆˆˆˆ XXXYYXYY ′′+′−′′−′
= βββ ˆˆˆ2 XXYXYY ′′+′′−′ ( ββ ˆˆ XYYX ′=′′ )
We need decide βˆ satisfies: )ˆˆˆ2(
ˆ
βββ
β
XXYXYYMin ′′+′′−′
The necessary condition for a minimum:
0ˆ
][
=
∂
′∂
β
ee ⇔ 0ˆ
]ˆˆˆ2[
=
∂
′′+′′−′∂
β
βββ XXYXYY
YX ′′βˆ = ]ˆˆˆ[ 21 kβββ
nkkkk
n
XXXX
XXXX
...
...
1...111
321
2322212
nY
Y
Y
2
1
Advanced Econometrics Part I: Basic Econometric Models
Chapter 1: Classical Linear Regression
Nam T. Hoang
University of New England - Australia 6 University of Economics - HCMC - Vietnam
= ]ˆˆˆ[ 21 kβββ
∑
∑
∑
iik
ii
i
YX
YX
Y
2
Take the derivative w.r.t each βˆ :
β
β
ˆ
]ˆ[
∂
′′∂ YX = YX
YX
YX
Y
iik
ii
i
′=
∑
∑
∑
2
XX ′ =
nkkkk
n
XXXX
XXXX
...
...
1...111
321
2322212
nknn
k
k
XXX
XXX
XXX
...1
...1
...1
32
22322
11312
=
∑∑∑∑ ∑∑
∑∑∑∑∑∑
∑∑∑
2
32
232
2
22
32
...
...
...
ikiikiikik
ikiiiii
ikii
XXXXXX
XXXXXX
XXXn
Symmetric Matrix of sums of squares and cross products:
ββ ˆˆ XX ′′ : quadratic form.
∑∑
= =
′=′′
k
i
k
j
jiijXXXX
1 1
ˆˆ)(ˆˆ ββββ
Take the derivatives w.r.t each iβˆ :
→
=′ →
′
′
≠
′=
∂
′∂
=
∂∂ njXX
XX
XX
ij
XX
XX
ij
jij
ijji
jiij
iij
i
iij
,1ˆ)(2
ˆˆ)(
ˆˆ)(
:
ˆ)(2ˆ
]ˆ)[(
:
ˆ/
2
β
ββ
ββ
β
β
β
β
Advanced Econometrics Part I: Basic Econometric Models
Chapter 1: Classical Linear Regression
Nam T. Hoang
University of New England - Australia 7 University of Economics - HCMC - Vietnam
→ njXX
XX
XX
jij
ijji
jiij ,1ˆ)(2
ˆˆ)(
ˆˆ)( ˆ/ =′ →
′
′
∂∂ β
ββ
ββ
β
Then β
β
ββ ˆ)(2ˆ
]ˆ)(ˆ[ XXXX ′=
∂
′′∂
So 0ˆ
][
=
∂
′∂
β
ee ⇔ 0ˆ)(22 =′+′− βXXYX (call "Normal equations").
→ =′ βˆ)( XX YX ′ → YXXX ′′= −1)(βˆ
Note: for the existence of 1)( −′XX we need assumption that Rank(X) = k
IV. ALGEBRAIC PROPERTIES OF LEAST SQUARES:
1. "Orthogonality condition":
⇔ 0ˆ)(22 =′+′− βXXYX (Normal equations).
⇔ 0)ˆ( =−′
e
XYX β
⇔ 0=′eX
⇔
nkkkk
n
XXXX
XXXX
...
...
1...111
321
2322212
ne
e
e
2
1
=
0
0
0
⇔ nj
eX
e
n
i
iij
n
i
i
,1
0
0
1
1 =
=
=
∑
∑
=
=
2. Deviation from mean model (The fitted regression passes through YX , )
nieXXXY ikikiii ,1ˆ...ˆˆˆ 33221 =+++++= ββββ
Sum overall n observations and divide by n
Advanced Econometrics Part I: Basic Econometric Models
Chapter 1: Classical Linear Regression
Nam T. Hoang
University of New England - Australia 8 University of Economics - HCMC - Vietnam
0
1
33221
ˆ...ˆˆˆ ∑
=
+++++=
n
i
ikk eXXXY ββββ
Then:
nieXXXXXXYY ikkikiii ,1)(ˆ...)(ˆ)(ˆˆ 3332221 =+−++−+−+=− ββββ
In model in deviation form, the intercept is put aside and can be found later.
3. The mean of the fitted values iYˆ is equal to the mean of the actual Yi value in the
sample:
iii XY εβ +′= = i
Y
i eX
i
+′
ˆ
βˆ
→ ∑
=
n
i
iY
1
=∑
=
n
i
iY
1
ˆ +
0
1
∑
=
n
i
ie ni ,1=
→ ∑ iY =∑ iYˆ
→ YY ˆ=
Note that: These results used the fact that the regression model include an intercept
term.
V. PARTITIONED REGRESSION: FRISH-WAUGH THEOREM:
1. Note: Fundamental idempotent matrix (M):
βˆXYe ′−=
)()( 1 YXXXXY ′′−= −
YXXXXI
nnnn
])([ 1
×
−
×
′′−=
)]()()[( 1 εβεβ +′′−+= − XXXXXX
)])()[( 1 εβεβ XXXXXX ′′−−+= −
ε])([
)(
1
nnM
XXXXI
×
′′−= −
So residuals vector e has two alternative representations:
Advanced Econometrics Part I: Basic Econometric Models
Chapter 1: Classical Linear Regression
Nam T. Hoang
University of New England - Australia 9 University of Economics - HCMC - Vietnam
=
=
εMe
MYe
M is the "residual maker" in the regression of Y on X.
M is symmetric and idempotent, that is:
=
′=
MMM
MM
.
[ ]′′′−=′ − XXXXIM 1)( = ])([ 1 ′′′− − XXXXI = ')'( 1 XXXXI −− =M
Note: ABAB ′′=′)(
[ ][ ]XXXXIXXXXIMM ′′−′′−= −− 11 )()(.
= XXXXI ′′− −1)( XXXX ′′− −1)( XXXXXXXX
I
′′′′+ −− 11 )()(
MXXXXI =′′−= −1)(
Also we have:
kn
nn
nkkkknkn
XXXXXXXXXXXXXIMX
×
×
×
−
×××
− =−=′′−=′′−= 0)(])([ 11
2. Partitioned Regression:
Suppose that our matrix of regressors is partitioned into two blocks:
)(][ 2121
21
kkkXXX
knknkn
=+=
×××
εββ ++=
××××× 1
22
1
11
1
2211 kknkkn
n
XXY
eXXY
n
+
=
× 2
1
21
1 ˆ
ˆ
][
β
β
The normal equations:
YXXX ′=′ βˆ)(
⇔ YXXXXXX ][ˆ][][ 212121 ′=′ β
Advanced Econometrics Part I: Basic Econometric Models
Chapter 1: Classical Linear Regression
Nam T. Hoang
University of New England - Australia 10 University of Economics - HCMC - Vietnam
⇔ Y
X
X
XX
X
X
′
′
=
′
′
2
1
2
1
21
2
1
ˆ
ˆ
][
β
β
⇔
′
′
=
′′
′′
YX
YX
XXXX
XXXX
2
1
2
1
2212
2111
ˆ
ˆ
β
β
⇔
′=′+′
′=′+′
)(ˆ)(ˆ)(
)(ˆ)(ˆ)(
2222112
1221111
bYXXXXX
aYXXXXX
ββ
ββ
From (a) → )ˆ(ˆ)( 221111 YXXXX +−′=′ ββ
or )()ˆ()(ˆ 221
1
111 cYXXXX +−′′=
− ββ
Put (c) into (b):
YXXXYXXXXXX 2222221
1
1112
ˆ)()ˆ())(( ′=′++−′′′ − ββ
⇔ YXXXXXYXXXXXXXXX 1
1
11122222221
1
1112 )(ˆ)(ˆ)( ′′′−′=′+′′′−
−− ββ
⇔ YXXXXIXXXXXXIX
nn
M
nn
M
)(
1
1
111222
)(
1
1
1112 ])([ˆ])([
×
−
×
− ′′−′=′′−′ β
We have: YMXXMX 122212 ˆ)( ′=′ β → YMXXMX 12
1
2122 )(ˆ ′′=
−β
Because
=
′=
MMM
MM
.
Then:
*
2
*
2
11222112
ˆ)(
YX
YMMXXMMX ′′=′′ β
**22
*
2
*
2 'ˆ)'( YXXX =β →
**
2
1*
2
*
22 ')' (ˆ YXXX
−=β
Where:
=
′′=→=
YMY
MXXXMX
1
*
12
*
221
*
2 '
Interpretation:
• YMY 1
* = = residuals from regression of
1×n
Y on
1
1
kn
X
×
Advanced Econometrics Part I: Basic Econometric Models
Chapter 1: Classical Linear Regression
Nam T. Hoang
University of New England - Australia 11 University of Economics - HCMC - Vietnam
• 21
*
2 XMX = = matrix of residuals from regressions of X2 variables on
1
1
kn
X
×
Suppose we regress Y on X1 and get the residuals and also regress X2 (each column of
X2) on X1 and get the matrix of the residuals.
• Regressing Y on X1, the residuals are:
*
11
1
111
1
1 ])[(ˆ YYMYXXXXYYYe
n
==′′−=−= −
×
• Regressing X2 (each column of X2 on
1
1
kn
X
×
):
22112
1
knkkknkn
XE
××××
+= εβ
=−=
×××
)ˆ(
22
2
22
knknkn
XXE )
ˆ(
2112
12
kkknkn
XX
×××
− β
= *22121
1
111 ])([ XXMXXXXXI ==′′−
−
• If we now take these residuals,
1
1
×n
e , and fit a regreesion: now we regress e1 on E:
uEe
kknn
+=
××× 11
1
22
~β
then we will have:
22
ˆ~ ββ =
We get the same results as if we just regress the whole model.
This results is called the "Frisch - Waugh" theorem.
Example: Y = Wages
X2 = Education (years of schooling).
X1 = Ability (test scores)
εββ ++= 2211 XXY
β2 = effect of one extra year of schooling on wages controlling for ability.
Advanced Econometrics Part I: Basic Econometric Models
Chapter 1: Classical Linear Regression
Nam T. Hoang
University of New England - Australia 12 University of Economics - HCMC - Vietnam
Y* = residuals from regression of Y on X1 (= variation in wages when controlling for
ability).
X* = residuals from regression of X2 on X1.
Then regress Y* on X* → get β2 : uXY += 2
*
2
* β
Example: De-trending, de-seasonaling data:
11
22
1
1
11
221
××××××
++=
nkknknn
XtY εββ
=
n
t
2
1
either include "t" in model or "de-trend" X2 & Y variables by regressing on "t" & taking
residuals.
Note: Including trend in regression is an effective way de-trending of data.
VI. GOODNESS OF FIT:
One way of measuring the "quality of the fitted regression line" is to measure the extent to
which the sample variable for the Y variable is explain by the model.
- The sample variability of Y is:
∑
=
−
n
i
i YYn 1
2)(1
or we could just use:
∑
=
−
n
i
i YY
1
2)(
- Our fitted regression:
eYeXY +=+= ˆβˆ
YXXXXXY ′′== −1)(ˆˆ β
Note that if the model includes an intercept, then YY ˆ=
Now consider the following matrix:
Advanced Econometrics Part I: Basic Econometric Models
Chapter 1: Classical Linear Regression
Nam T. Hoang
University of New England - Australia 13 University of Economics - HCMC - Vietnam
′−=
×
~~
111
n
IM
nn
C
where
=
×
1
1
1
1
1
~
n
Note that:
YM 0 =
−
nnn
nnn
nnn
111
111
111
100
010
001
nY
Y
Y
2
1
=
nnnn
nnn
nnn
YYY
YYY
YYY
1
1
1
1
1
1
2
1
1
1
1
=
−
−
−
YY
YY
YY
n
2
1
We have:
• M0 is idempotent.
• M0
~
1=
~
0
• =′=′ YMYYMMY
YM
00
)'(
0
0
'( ∑
=
−
n
i
i YY
1
2)(
So: eYeXY +=+= ˆβˆ
eMYMeMXMYM 00000 ˆˆ +=+= β
Recall that:
~
0=′=′ XeeX ( )0 0 eeMei =→=∑
~
00 0' =′=′ XMeXMe
→ =′ YMY 0
)ˆ( ′+ eXβ )ˆ( 00 eMXM +β
= )''
ˆ( eX +′β )ˆ( 00 eMXM +β
Advanced Econometrics Part I: Basic Econometric Models
Chapter 1: Classical Linear Regression
Nam T. Hoang
University of New England - Australia 14 University of Economics - HCMC - Vietnam
= eMXXMX
00 'ˆˆ'ˆ ′+′ βββ eMeXMe 00 ˆ ′+′+ β
= ββ
ˆ'ˆ 0 XMX ′ eMe 0′+
So:
SSE
n
i
i
SSR
n
i
i
SST
n
i
i eYYYY ∑∑∑
===
+−=−
1
2
1
2
1
2 )ˆ()(
( YY −ˆ ; βˆˆ XY = so ββ
ˆ'ˆ 0 XMX ′ = =′ YMY ˆˆ 0 ∑
=
−
n
i
i YY
1
2)ˆ( )
SST: Total sum of squares.
SSR: Regression sum of squares
SSE: Error sum of squares
Coefficient of Determination:
SST
SSE
SST
SSRR −== 12
(only if intercept included in models).
Note: 02 ≥=
SST
SSRR
112 ≤−=
SST
SSER
⇒ 0 ≤ R2 ≤ 1
What happens if we add any regressor(s) to the model?
)1(11 εβ += XY
=++= uXXY 2211 ββ )2(uX +β
(A) Applying OLS to (2)
uu'ˆmin
)ˆˆ( 21ββ
(B) Applying OLS to (1)
Advanced Econometrics Part I: Basic Econometric Models
Chapter 1: Classical Linear Regression
Nam T. Hoang
University of New England - Australia 15 University of Economics - HCMC - Vietnam
ee'min
)( 1β
Problem (B) is just problem A subject to the restriction that β2 = 0. The minimized value
in (A) must be ≤ that in (B) so eeuu ''ˆ = .
→ Adding any regression(s) to the model cannot increase (typically decrease) the sum
of squared residuals so R2 must increase (or at worst stay the same), so R2 is not really a
very interesting measure of the quality of regression.
For this reason, we often use the "Adjusted" R2-Adjusted for "degree of freedom":
′
′
−=
YMY
eeR 0
2 1
−′
−′
−=
)1/(
)/(1 0
2
nYMY
kneeR
Note: YMYee
0′=′ and rank(M) = (n-k)
=YMY 0' ∑
=
−
n
i
i YY
1
2)ˆ(
d of freedom = n-1
2R may ↑ or ↓ when variables are added. It may even be negative.
Note that: If the model does not include an Intercept, then the equation: SST = SSR + SSE
does not hold. And we no longer have 0 ≤ R2 ≤ 1. We must also be careful in comparing R2
across different models. For example:
(1) ii YC 8.05.0ˆ += R
2 = 0.85
(2) uYC ii ++= log7.02.0log R
2 = 0.7
In (1) R2 relates to sample variation of the variable C. In (2), R2 relates to sample variation of
the variable log(C). Reading Home: Greene, chapter 3&4
Các file đính kèm theo tài liệu này:
- chapter_01_classical_linear_regression_3621.pdf