Advanced Econometrics - Part I - Chapter 1: Classical Linear Regression

Tài liệu Advanced Econometrics - Part I - Chapter 1: Classical Linear Regression: Advanced Econometrics Part I: Basic Econometric Models Chapter 1: Classical Linear Regression Nam T. Hoang University of New England - Australia 1 University of Economics - HCMC - Vietnam Chapter 1: CLASSICAL LINEAR REGRESSION I. MODEL: Population model: ε+= ),...,,( 21 kXXXfY - f may be any kind (linear, non-linear, parametric, non-parametric, ...) - We'll focus on: parametric and linear in the parameters. Sample information: - We have a sample: { niikii XXXY 12 },...,,, = - Assume that these observed values are generated by the population model: ikikiii XXXY εββββ +++++= ...33221 - Objectives: i. Estimate unknown parameters. ii. Test hypotheses about parameters. iii. Predict values of y outside sample. - Note that: ki i k X Y ∂ ∂ =β , so the parameters are the marginal effect of the X's on Y, with other factors held constant. EX: iii YC εββ ++= 21 Dependent variable Explanatory variable or Regressor Disturbance (error) ...

15 trang | Chia sẻ: honghanh66 | Lượt xem: 1261 | Lượt tải: 0Free

Bạn đang xem nội dung tài liệu Advanced Econometrics - Part I - Chapter 1: Classical Linear Regression, để tải tài liệu về máy bạn click vào nút DOWNLOAD ở trên

Advanced Econometrics Part I: Basic Econometric Models Chapter 1: Classical Linear Regression Nam T. Hoang University of New England - Australia 1 University of Economics - HCMC - Vietnam Chapter 1: CLASSICAL LINEAR REGRESSION I. MODEL: Population model: ε+= ),...,,( 21 kXXXfY - f may be any kind (linear, non-linear, parametric, non-parametric, ...) - We'll focus on: parametric and linear in the parameters. Sample information: - We have a sample: { niikii XXXY 12 },...,,, = - Assume that these observed values are generated by the population model: ikikiii XXXY εββββ +++++= ...33221 - Objectives: i. Estimate unknown parameters. ii. Test hypotheses about parameters. iii. Predict values of y outside sample. - Note that: ki i k X Y ∂ ∂ =β , so the parameters are the marginal effect of the X's on Y, with other factors held constant. EX: iii YC εββ ++= 21 Dependent variable Explanatory variable or Regressor Disturbance (error) Advanced Econometrics Part I: Basic Econometric Models Chapter 1: Classical Linear Regression Nam T. Hoang University of New England - Australia 2 University of Economics - HCMC - Vietnam CPM Y C i i k ..=∂ ∂ =β ⇒ require 0 ≤ β ≤ 1 Denotes: Y =             nY Y Y  2 1 ; X =             nknn k k XXX XXX XXX ...1 ...1 ...1 32 22322 11312  ; β =             nβ β β  2 1 and ε =             nε ε ε  2 1 ⇒ We have: )1()1()()1( ×××× += nkknn XY εβ II. ASSUMPTIONS OF THE CLASSICAL REGRESSION MODEL: Models are simplications of reality. We'll make a set of simplifying assumptions for the model. The assumptions relate to: - Functional form. - Regressors. - Disturbances. Assumption 1: Linearity. The model is linear in the parameters. Y = X.β + ε Assumption 2: Full rank. Xijs are not random variables - or Xijs are random variables that are uncorrelated with ε. There is no exact linear dependencies among the columns of X. This assumption will be necessary for estimation of the parameters (need no EXACT). )( 1 kn X × ; Rank(X) = k implies n > k as Rank (A) ≤ min (Rows, Columns) Rank(X) = n is also OK. Advanced Econometrics Part I: Basic Econometric Models Chapter 1: Classical Linear Regression Nam T. Hoang University of New England - Australia 3 University of Economics - HCMC - Vietnam Assumption 3: Exogeneity of the independent variables. 0],...,,[ 321 =jkjjji XXXXE ε i = j also i ≠j This means that the independent variables will not carry useful information for prediction of εi 0)(,10][ =→=∀= ii EniXE εε Assumption 4:     ≠∀= == jiCov niVar ji i 0),( ,1)( 2 εε σε For any random vector Z =             nz z z  2 1 , we can express its variance - covariance matrix as: =−−= ]))'())(([()( ZEZZEZEZVarCov                   −−−             − − − = × ×       m mm m mm zEzzEzzEz zEz zEz zEz E 1 2211 1 22 11 ))(())(())(([ )( )( )( jth diagonal element is var(zj) = σjj = σj2 ijth element (i ≠ j) is cov(zi,zj) = σij =             − mnmm m ijzEzE σσσ σσσ σσ     21 22221 12 2 11 ]))([( So we have "covariance matrix" for the vector ε   )'(])')()()([()( 00 εεεεεεε EEEEVarCov =−−= Then the assumption (4) is equivalent: Advanced Econometrics Part I: Basic Econometric Models Chapter 1: Classical Linear Regression Nam T. Hoang University of New England - Australia 4 University of Economics - HCMC - Vietnam             == 2 2 2 2 00 00 00 )'( σ σ σ σεε     IE ⇔     ≠∀= == )(0),( )(hom,1)( 2 ationautocorrelnojiCov ityoscedasticniVar ji i εε σε Assumption 5: Data generating process for the regressors. (Non-stochastic of X). + Xijs are not random variables. Notes: This assumption is different with assumption 3. 0][ =XE iε tell about the mean only (has to be 0). Assumption 6: Normality of Errors. ],0[~ 2 IN σε + Normality is not necessary to obtain many results in the regression model. + It will be possible to relax this assumption and retain most of the statistic results. SUMMARY: The classical linear regression model is: Y = X.β + ε ],0[~ 2 IN σε Rank(X) = k X is non-stochastic III. LEAST SQUARES ESTIMATION: (Ordinary Least Squares Estimation - OLS) Our first task is to estimate the parameters of the model: Y = X.β + ε with ],0[~ 2 IN σε Advanced Econometrics Part I: Basic Econometric Models Chapter 1: Classical Linear Regression Nam T. Hoang University of New England - Australia 5 University of Economics - HCMC - Vietnam Many possible procedures for doing this. The choice should be based on "sampling properties" of estimates. Let's consider one possible estimation strategy: Least Squares. Denote βˆ is estimator of β: True relation: Y = X.β + ε Estimated Relation: Y = X. βˆ + e while: e =             ne e e  2 1 is estimated residuals (of ε) or ei is estimated of εi For the ith observation:  )( population unobserved iii XY εβ +′= =  )( ' ˆ sample observed ii eX +β Sum of square residuals: ∑ = =′ n i ieee 1 2 eee n i i ′=∑ =1 2 = )ˆ()ˆ( ββ XYXY −′− = ββββ ˆˆˆˆ XXXYYXYY ′′+′−′′−′ = βββ ˆˆˆ2 XXYXYY ′′+′′−′ ( ββ ˆˆ XYYX ′=′′ ) We need decide βˆ satisfies:  )ˆˆˆ2( ˆ βββ β XXYXYYMin ′′+′′−′ The necessary condition for a minimum: 0ˆ ][ = ∂ ′∂ β ee ⇔ 0ˆ ]ˆˆˆ2[ = ∂ ′′+′′−′∂ β βββ XXYXYY YX ′′βˆ = ]ˆˆˆ[ 21 kβββ              nkkkk n XXXX XXXX ... ... 1...111 321 2322212              nY Y Y  2 1 Advanced Econometrics Part I: Basic Econometric Models Chapter 1: Classical Linear Regression Nam T. Hoang University of New England - Australia 6 University of Economics - HCMC - Vietnam = ]ˆˆˆ[ 21 kβββ                ∑ ∑ ∑ iik ii i YX YX Y  2 Take the derivative w.r.t each βˆ : β β ˆ ]ˆ[ ∂ ′′∂ YX = YX YX YX Y iik ii i ′=               ∑ ∑ ∑  2 XX ′ =             nkkkk n XXXX XXXX ... ... 1...111 321 2322212              nknn k k XXX XXX XXX ...1 ...1 ...1 32 22322 11312                = ∑∑∑∑ ∑∑ ∑∑∑∑∑∑ ∑∑∑ 2 32 232 2 22 32 ... ... ... ikiikiikik ikiiiii ikii XXXXXX XXXXXX XXXn  Symmetric Matrix of sums of squares and cross products: ββ ˆˆ XX ′′ : quadratic form. ∑∑ = = ′=′′ k i k j jiijXXXX 1 1 ˆˆ)(ˆˆ ββββ Take the derivatives w.r.t each iβˆ : →         =′ →     ′ ′ ≠ ′= ∂ ′∂ = ∂∂ njXX XX XX ij XX XX ij jij ijji jiij iij i iij ,1ˆ)(2 ˆˆ)( ˆˆ)( : ˆ)(2ˆ ]ˆ)[( : ˆ/ 2 β ββ ββ β β β β Advanced Econometrics Part I: Basic Econometric Models Chapter 1: Classical Linear Regression Nam T. Hoang University of New England - Australia 7 University of Economics - HCMC - Vietnam → njXX XX XX jij ijji jiij ,1ˆ)(2 ˆˆ)( ˆˆ)( ˆ/ =′ →     ′ ′ ∂∂ β ββ ββ β Then β β ββ ˆ)(2ˆ ]ˆ)(ˆ[ XXXX ′= ∂ ′′∂ So 0ˆ ][ = ∂ ′∂ β ee ⇔ 0ˆ)(22 =′+′− βXXYX (call "Normal equations"). → =′ βˆ)( XX YX ′ → YXXX ′′= −1)(βˆ Note: for the existence of 1)( −′XX we need assumption that Rank(X) = k IV. ALGEBRAIC PROPERTIES OF LEAST SQUARES: 1. "Orthogonality condition": ⇔ 0ˆ)(22 =′+′− βXXYX (Normal equations). ⇔ 0)ˆ( =−′  e XYX β ⇔ 0=′eX ⇔             nkkkk n XXXX XXXX ... ... 1...111 321 2322212              ne e e  2 1             = 0 0 0  ⇔ nj eX e n i iij n i i ,1 0 0 1 1 =       = = ∑ ∑ = = 2. Deviation from mean model (The fitted regression passes through YX , ) nieXXXY ikikiii ,1ˆ...ˆˆˆ 33221 =+++++= ββββ Sum overall n observations and divide by n Advanced Econometrics Part I: Basic Econometric Models Chapter 1: Classical Linear Regression Nam T. Hoang University of New England - Australia 8 University of Economics - HCMC - Vietnam  0 1 33221 ˆ...ˆˆˆ ∑ = +++++= n i ikk eXXXY ββββ Then: nieXXXXXXYY ikkikiii ,1)(ˆ...)(ˆ)(ˆˆ 3332221 =+−++−+−+=− ββββ In model in deviation form, the intercept is put aside and can be found later. 3. The mean of the fitted values iYˆ is equal to the mean of the actual Yi value in the sample: iii XY εβ +′= = i Y i eX i +′ ˆ βˆ → ∑ = n i iY 1 =∑ = n i iY 1 ˆ +  0 1 ∑ = n i ie ni ,1= → ∑ iY =∑ iYˆ → YY ˆ= Note that: These results used the fact that the regression model include an intercept term. V. PARTITIONED REGRESSION: FRISH-WAUGH THEOREM: 1. Note: Fundamental idempotent matrix (M): βˆXYe ′−= )()( 1 YXXXXY ′′−= −  YXXXXI nnnn ])([ 1 × − × ′′−= )]()()[( 1 εβεβ +′′−+= − XXXXXX )])()[( 1 εβεβ XXXXXX ′′−−+= − ε])([ )( 1    nnM XXXXI × ′′−= − So residuals vector e has two alternative representations: Advanced Econometrics Part I: Basic Econometric Models Chapter 1: Classical Linear Regression Nam T. Hoang University of New England - Australia 9 University of Economics - HCMC - Vietnam    = = εMe MYe M is the "residual maker" in the regression of Y on X. M is symmetric and idempotent, that is:    = ′= MMM MM . [ ]′′′−=′ − XXXXIM 1)( = ])([ 1 ′′′− − XXXXI = ')'( 1 XXXXI −− =M Note: ABAB ′′=′)( [ ][ ]XXXXIXXXXIMM ′′−′′−= −− 11 )()(. = XXXXI ′′− −1)( XXXX ′′− −1)( XXXXXXXX I ′′′′+ −− 11 )()(  MXXXXI =′′−= −1)( Also we have:      kn nn nkkkknkn XXXXXXXXXXXXXIMX × × × − ××× − =−=′′−=′′−= 0)(])([ 11    2. Partitioned Regression: Suppose that our matrix of regressors is partitioned into two blocks:    )(][ 2121 21 kkkXXX knknkn =+= ×××      εββ ++= ××××× 1 22 1 11 1 2211 kknkkn n XXY  eXXY n +         = × 2 1 21 1 ˆ ˆ ][ β β The normal equations: YXXX ′=′ βˆ)( ⇔ YXXXXXX ][ˆ][][ 212121 ′=′ β Advanced Econometrics Part I: Basic Econometric Models Chapter 1: Classical Linear Regression Nam T. Hoang University of New England - Australia 10 University of Economics - HCMC - Vietnam ⇔ Y X X XX X X       ′ ′ =               ′ ′ 2 1 2 1 21 2 1 ˆ ˆ ][ β β ⇔       ′ ′ =               ′′ ′′ YX YX XXXX XXXX 2 1 2 1 2212 2111 ˆ ˆ β β ⇔     ′=′+′ ′=′+′ )(ˆ)(ˆ)( )(ˆ)(ˆ)( 2222112 1221111 bYXXXXX aYXXXXX ββ ββ From (a) → )ˆ(ˆ)( 221111 YXXXX +−′=′ ββ or )()ˆ()(ˆ 221 1 111 cYXXXX +−′′= − ββ Put (c) into (b): YXXXYXXXXXX 2222221 1 1112 ˆ)()ˆ())(( ′=′++−′′′ − ββ ⇔ YXXXXXYXXXXXXXXX 1 1 11122222221 1 1112 )(ˆ)(ˆ)( ′′′−′=′+′′′− −− ββ ⇔ YXXXXIXXXXXXIX nn M nn M      )( 1 1 111222 )( 1 1 1112 ])([ˆ])([ × − × − ′′−′=′′−′ β We have: YMXXMX 122212 ˆ)( ′=′ β → YMXXMX 12 1 2122 )(ˆ ′′= −β Because    = ′= MMM MM . Then:  * 2 * 2 11222112 ˆ)( YX YMMXXMMX ′′=′′ β **22 * 2 * 2 'ˆ)'( YXXX =β → ** 2 1* 2 * 22 ')' (ˆ YXXX −=β Where:     = ′′=→= YMY MXXXMX 1 * 12 * 221 * 2 ' Interpretation: • YMY 1 * = = residuals from regression of  1×n Y on  1 1 kn X × Advanced Econometrics Part I: Basic Econometric Models Chapter 1: Classical Linear Regression Nam T. Hoang University of New England - Australia 11 University of Economics - HCMC - Vietnam • 21 * 2 XMX = = matrix of residuals from regressions of X2 variables on  1 1 kn X × Suppose we regress Y on X1 and get the residuals and also regress X2 (each column of X2) on X1 and get the matrix of the residuals. • Regressing Y on X1, the residuals are:  * 11 1 111 1 1 ])[(ˆ YYMYXXXXYYYe n ==′′−=−= − × • Regressing X2 (each column of X2 on  1 1 kn X × ):     22112 1 knkkknkn XE ×××× += εβ    =−= ××× )ˆ( 22 2 22 knknkn XXE    ) ˆ( 2112 12 kkknkn XX ××× − β = *22121 1 111 ])([ XXMXXXXXI ==′′− − • If we now take these residuals,  1 1 ×n e , and fit a regreesion: now we regress e1 on E:    uEe kknn += ××× 11 1 22 ~β then we will have: 22 ˆ~ ββ = We get the same results as if we just regress the whole model. This results is called the "Frisch - Waugh" theorem. Example: Y = Wages X2 = Education (years of schooling). X1 = Ability (test scores) εββ ++= 2211 XXY β2 = effect of one extra year of schooling on wages controlling for ability. Advanced Econometrics Part I: Basic Econometric Models Chapter 1: Classical Linear Regression Nam T. Hoang University of New England - Australia 12 University of Economics - HCMC - Vietnam Y* = residuals from regression of Y on X1 (= variation in wages when controlling for ability). X* = residuals from regression of X2 on X1. Then regress Y* on X* → get β2 : uXY += 2 * 2 * β Example: De-trending, de-seasonaling data:       11 22 1 1 11 221 ×××××× ++= nkknknn XtY εββ             = n t  2 1 either include "t" in model or "de-trend" X2 & Y variables by regressing on "t" & taking residuals. Note: Including trend in regression is an effective way de-trending of data. VI. GOODNESS OF FIT: One way of measuring the "quality of the fitted regression line" is to measure the extent to which the sample variable for the Y variable is explain by the model. - The sample variability of Y is: ∑ = − n i i YYn 1 2)(1 or we could just use: ∑ = − n i i YY 1 2)( - Our fitted regression: eYeXY +=+= ˆβˆ YXXXXXY ′′== −1)(ˆˆ β Note that if the model includes an intercept, then YY ˆ= Now consider the following matrix: Advanced Econometrics Part I: Basic Econometric Models Chapter 1: Classical Linear Regression Nam T. Hoang University of New England - Australia 13 University of Economics - HCMC - Vietnam      ′−= × ~~ 111 n IM nn C where             = × 1 1 1 1 1 ~  n Note that: YM 0 =                           −             nnn nnn nnn 111 111 111 100 010 001                     nY Y Y  2 1 =             nnnn nnn nnn YYY YYY YYY 1 1 1 1 1 1 2 1 1 1 1     =               − − − YY YY YY n  2 1 We have: • M0 is idempotent. • M0 ~ 1= ~ 0 • =′=′ YMYYMMY YM 00 )'( 0 0 '(  ∑ = − n i i YY 1 2)( So: eYeXY +=+= ˆβˆ eMYMeMXMYM 00000 ˆˆ +=+= β Recall that: ~ 0=′=′ XeeX ( )0 0 eeMei =→=∑ ~ 00 0' =′=′ XMeXMe → =′ YMY 0 )ˆ( ′+ eXβ )ˆ( 00 eMXM +β = )'' ˆ( eX +′β )ˆ( 00 eMXM +β Advanced Econometrics Part I: Basic Econometric Models Chapter 1: Classical Linear Regression Nam T. Hoang University of New England - Australia 14 University of Economics - HCMC - Vietnam = eMXXMX 00 'ˆˆ'ˆ ′+′ βββ eMeXMe 00 ˆ ′+′+ β = ββ ˆ'ˆ 0 XMX ′ eMe 0′+ So:  SSE n i i SSR n i i SST n i i eYYYY ∑∑∑ === +−=− 1 2 1 2 1 2 )ˆ()( ( YY −ˆ ; βˆˆ XY = so ββ ˆ'ˆ 0 XMX ′ = =′ YMY ˆˆ 0 ∑ = − n i i YY 1 2)ˆ( ) SST: Total sum of squares. SSR: Regression sum of squares SSE: Error sum of squares Coefficient of Determination: SST SSE SST SSRR −== 12 (only if intercept included in models). Note: 02 ≥= SST SSRR 112 ≤−= SST SSER ⇒ 0 ≤ R2 ≤ 1 What happens if we add any regressor(s) to the model? )1(11 εβ += XY =++= uXXY 2211 ββ )2(uX +β (A) Applying OLS to (2) uu'ˆmin )ˆˆ( 21ββ (B) Applying OLS to (1) Advanced Econometrics Part I: Basic Econometric Models Chapter 1: Classical Linear Regression Nam T. Hoang University of New England - Australia 15 University of Economics - HCMC - Vietnam ee'min )( 1β Problem (B) is just problem A subject to the restriction that β2 = 0. The minimized value in (A) must be ≤ that in (B) so eeuu ''ˆ = . → Adding any regression(s) to the model cannot increase (typically decrease) the sum of squared residuals so R2 must increase (or at worst stay the same), so R2 is not really a very interesting measure of the quality of regression. For this reason, we often use the "Adjusted" R2-Adjusted for "degree of freedom":     ′ ′ −= YMY eeR 0 2 1       −′ −′ −= )1/( )/(1 0 2 nYMY kneeR Note: YMYee 0′=′ and rank(M) = (n-k) =YMY 0' ∑ = − n i i YY 1 2)ˆ( d of freedom = n-1 2R may ↑ or ↓ when variables are added. It may even be negative. Note that: If the model does not include an Intercept, then the equation: SST = SSR + SSE does not hold. And we no longer have 0 ≤ R2 ≤ 1. We must also be careful in comparing R2 across different models. For example: (1) ii YC 8.05.0ˆ += R 2 = 0.85 (2) uYC ii ++= log7.02.0log R 2 = 0.7 In (1) R2 relates to sample variation of the variable C. In (2), R2 relates to sample variation of the variable log(C). Reading Home: Greene, chapter 3&4

Các file đính kèm theo tài liệu này:

chapter_01_classical_linear_regression_3621.pdf