Tài liệu Advanced Econometrics - Part II - Chapter 5: Limited - Dependent Variable Models: Advanced Econometrics - Part II Chapter 5: Limited - Dependent Variable Models
Nam T. Hoang
UNE Business School 1 University of New England
Chapter 5
LIMITED - DEPENDENT VARIABLE MODELS:
TRUNCATION, CENSORING (TOBIT)
AND SAMPLE SELECTION.
I. TRUNCATION:
The effect of truncation occurs when sample data are drawn from a subset of a larger
population of interest.
1. Truncated distributions:
Is the part of an untruncated distribution that is above or below some specified value
• Density of a truncated random variable:
If a continuous random variable x has pdf )(xf and a is a constant then:
( )( )
Prob( )
f xf x x a
x a
> =
>
If ),(~ 2σµNx
)(11)( α
σ
µ
Φ−=
−Φ−=>→
aaxP ,
−=
σ
µα a
)(1
2
1
)(1
)()(
2
2
2
)(
2
α
πσ
α
σ
µ
Φ−
=
Φ−
=>
−− x
e
xfaxxf
)(1
1
α
σ
µφ
σ
Φ−
−
=
x
)'( Φ=φ
o Truncated standard normal distribution:
2. Moments of truncated distrib...
13 trang |
Chia sẻ: honghanh66 | Lượt xem: 702 | Lượt tải: 0
Bạn đang xem nội dung tài liệu Advanced Econometrics - Part II - Chapter 5: Limited - Dependent Variable Models, để tải tài liệu về máy bạn click vào nút DOWNLOAD ở trên
Advanced Econometrics - Part II Chapter 5: Limited - Dependent Variable Models
Nam T. Hoang
UNE Business School 1 University of New England
Chapter 5
LIMITED - DEPENDENT VARIABLE MODELS:
TRUNCATION, CENSORING (TOBIT)
AND SAMPLE SELECTION.
I. TRUNCATION:
The effect of truncation occurs when sample data are drawn from a subset of a larger
population of interest.
1. Truncated distributions:
Is the part of an untruncated distribution that is above or below some specified value
• Density of a truncated random variable:
If a continuous random variable x has pdf )(xf and a is a constant then:
( )( )
Prob( )
f xf x x a
x a
> =
>
If ),(~ 2σµNx
)(11)( α
σ
µ
Φ−=
−Φ−=>→
aaxP ,
−=
σ
µα a
)(1
2
1
)(1
)()(
2
2
2
)(
2
α
πσ
α
σ
µ
Φ−
=
Φ−
=>
−− x
e
xfaxxf
)(1
1
α
σ
µφ
σ
Φ−
−
=
x
)'( Φ=φ
o Truncated standard normal distribution:
2. Moments of truncated distributions:
[ ] ( )
a
E x x a xf x x a dx µ
∞
> = > =∫
2)( µ−= ∫ xaV
Advanced Econometrics - Part II Chapter 5: Limited - Dependent Variable Models
Nam T. Hoang
UNE Business School 2 University of New England
o Truncated mean and truncated variance
If 2~ ( , )x N µ σ and a is a constant
)(][ ασλµ +=>
<
axxE
)(1[][ 2 αδσ −=>
<
axxVar
Where
−=
σ
µα a , (.)φ is this standard normal density
And [ ])(1)()( ααφαλ Φ−= if ax >
)()()( ααφαλ Φ−= if ax <
And ])()[()( ααλαλαδ −=
1)(0 << αδ for all values of α
2 2truncatedσ σ<
3. The truncated regression model:
Assume now: βµ ii X=
iii XY εβ +=
Where: ),0(~ 2σε NX ii
So that ),(~ 2σβiii XNXY
We are interested in the distribution of Yi given that Yi is greater than the truncation
point a
]/)[(1
]/)[(][
σβ
σβφσβ
i
i
iii Xa
XaXaYYE
−Φ−
−
+=>
i
i
ii
i
ii
X
dd
X
aYYE
∂
∂
+=
∂
>∂ ααλσβ )/(
][
))(( 2 σ
βλαλσβ −−+= iii
)1( 2 iii λαλβ +−=
)1( iδβ −=
Where:
σ
βα iii
Xa −
= , )( ii αλλ = , )( ii αδδ =
iδ−1 is between zero and 1 for every element of Xi , the marginal effect is less than
the corresponding coefficient
Advanced Econometrics - Part II Chapter 5: Limited - Dependent Variable Models
Nam T. Hoang
UNE Business School 3 University of New England
)1(][ 2 δσ −=> aYYVar ii
o Estimate:
iiiii uaYYEaYY +>=> ][
iii uX ++= σλβ
)1(][ 2 iiuVar δσ −=
If we use OLS on (Yi,Xi) we omit iλ all the biases that arise because of an omitted
variable can be expected.
o If )( YXE in the full population is a linear function of Y then βτ=b for some τ
II. CENSORED DATA
• A very common problem in micro economic data is censoring of dependent variable.
• When the dependent variable is censored, value in a certain range are all transferred to (or
reported as) a single value.
4. The censored normal distribution:
When data is censored the distribution that applies to the sample data is a mixture of
discrete and continuous distribution.
Define a new random variable Y transformed from the original one, *Y by:
>=
≤=
0
00
**
*
YifYY
YifY
If ),(~ 2* σµNY
*Prob( 0) Prob( 0) ( ) 1 ( )y Y µ µ
σ σ
−
= = ≤ = Φ = −Φ
If 0* >Y then Y has the density of *Y This is the mixture of discrete and continuous parts.
Moments: ),(~ 2* σµNY and aY = if aY ≤* or else *YY = then:
))(1(.][ σλµ +Φ−+Φ= aYE
])()1)[(1(][ 22 Φ−+−Φ−= λαδσYVar
Where:
Advanced Econometrics - Part II Chapter 5: Limited - Dependent Variable Models
Nam T. Hoang
UNE Business School 4 University of New England
*
2
( ) Prob( )
/ (1 ) ( )
1
a Y aµ α
σ
ϕ
λ ϕ
δ λ λα
− Φ = Φ = ≤ = Φ
= −Φ =
−Φ
= −
For a=0 ))((]0[ σλµ
σ
µ
+Φ==aYE
)(
)(
σ
µ
σ
µφ
λ
Φ
=
5. The censored Regression Model: (Tobit Model)
a. Model:
iii XY εβ +=
*
>=
≤=
0
00
**
*
iii
ii
YifYY
YifY
We only know iY
)(][ iiiii X
XXYE σλβ
σ
β
++
−
Φ=
Note: βµ iii XXYE == ][
*
Where:
)/(
)/(
]/)0[(1
]/)0[(
σβ
σβφ
σβ
σβφλ
i
i
i
i
i X
X
X
X
Φ
=
−Φ−
−
=
For the *Y variable
[ ]
β=
∂
∂
i
i
X
XYE *
but *Y is unobservable
b. Marginal Effects:
iii XY εβ +=
*
<=
<<=
≤=
*
**
*
ii
ii
ii
YbifbY
bYaifYY
aYifaY
Let )(εf & )(εF denote the density and cdf of ε assume ),0(~ 2σε iid and )()( εε fXf =
Then
Advanced Econometrics - Part II Chapter 5: Limited - Dependent Variable Models
Nam T. Hoang
UNE Business School 5 University of New England
*( ) *Prob[ ]
E Y X
a Y b
X
β
∂
= < <
∂
This result does not assume ε is normally distributed. For the standard case with censoring at
zero and normally distributed disturbances; ),0(~ 2σε N
Φ=
∂
∂
σ
ββ i
i
ii X
X
XYE
.
)(
OLS estimates usually = MLE estimate times the proportion of non-limit observations in the
sample
o A useful decomposition of
i
ii
X
XYE
∂
∂ )(
{ }))](1[.)( iiiiiii
i
ii
X
XYE
λαφλαλβ +++−Φ=
∂
∂
Where:
σ
βα ii
X
= , )( ii αΦ=Φ and
i
i
i Φ
=
φλ
Taking two parts separately
( ) [ , 0]Prob[ 0].i i i i ii
i i
E Y X E Y X Y
Y
X X
∂ ∂ >
= >
∂ ∂
Prob[ 0][ , 0]. ii i i
i
Y
E Y X Y
X
∂ >
+ >
∂
Thus, a change in Xi has two effects: It affects the conditional mean of *iY in the positive
part of the distribution and it affects the probability that the observation will fall in that
part of the distribution.
6. Estimation and Inference with Censored Tobit:
Estimation of Tobit model and the truncated regression is similar using MLE.
The log-likelihood for the censored regression model is
0 0
11
i i
i i i
y y
X Y Xβ β
ϕ
σ σ σ= >
− = −Φ
∏ ∏
2
2
2
0 0
( )1ln ( ) ln(2 ) ln ln 1
2
i i
i i i
y y
Y X X
L
β β
π σ
σσ> =
− → = = − + + + −Φ
∑ ∑
The two parts correspond to the classical regression for the non-limit observations and the
relevant probabilities for the limit observation. This likelihood is a mixture of discrete and
Advanced Econometrics - Part II Chapter 5: Limited - Dependent Variable Models
Nam T. Hoang
UNE Business School 6 University of New England
continuous distribution MLE produce an estimator with all the familiar desirable
properties attained by MLEs.
o With
σ
βγ = and →=
σ
θ 1
[ ]2 2
0 0
1( ) ln(2 ) ln ( ) ln 1 ( )
2
i i
i i i
Y Y
L y X Xπ θ θ γ γ
> =
→ = − − + − + −Φ ∑ ∑
The Hessian is always negative definite. Newton-Raphson method is simple to use and
usually converges quickly.
o By contrast, for the truncated model
0 1
1
( )
1i
i i
n
i i
y i i
Y X
f Y Y a
a X
β
ϕ
σ σ
β
σ
> =
−
= > =
− −Φ
∏ ∏
2
2
2
1
( )1ln ln(2 ) ln( ) ln 1
2
n
i i i
i
Y X a X
L
β β
π σ
σσ=
− − − = = + + − −Φ
∑
After convergence, the original parameters can be uncovered using
θ
σ 1= and
θ
γβ =
Asymptotic covariance matrix of ),( σβ
)(
''
i
iii
iiiii XA
cXb
XbXXa
=
Where { }iiiiii Xa Φ−Φ−−−= − )]1([ 22 φγφσ
{ } 2/)]1()[()( 223 iiiiiii XXb Φ−−+= − φγφφγσ
{ } 4/2)]1()[()()( 234 iiiiiiiii XXXc Φ−Φ−−+−= − φγφγφγσ
σ
βγ = iφ and iΦ are evaluated at γiX
1
1
)(),(
−
=
= ∑
n
i
iXAVarCov σβ
Where:
=)( iXA
o Researchers often compute least squares estimates despite their inconsistency.
Advanced Econometrics - Part II Chapter 5: Limited - Dependent Variable Models
Nam T. Hoang
UNE Business School 7 University of New England
o Empirical regularity: MLE estimates can be approximated by dividing OLS estimates by
the propotion of non-limit observation in the sample:
Φ=
−>=−>=>+=>
σ
β
σ
β
σ
εβεεβ iiiiiii
XXobXobXobYob Pr)(Pr)0(Pr)0(Pr *
Φ=
σ
βββ iMLEOLS
X
o Another strategy is to discard the limit observations, that just trades the censoring problem
for the truncation problem.
III. SOME ISSUES IN SPECIFICATION
Heteroscedascticity and Non-normality:
o Both heteroskedasticity & non-normality result in the Tobit estimator βˆ being
inconsistent for β .
o Note that in OLS we don’t need normality, consistency based on the CLT and we only
need ( ) 0=XE ε (exogeneity) data censoring can be costly.
o Presence of hetero or non-normality in Tobit on truncated model entirely changes the
functional forms for ( )0, >YXYE and ( )XYE .
IV. SAMPLE SELECTION MODEL:
7. Incidental Truncation in a Bivariate Distribution:
o Suppose that y & Z have a bivariate distribution with correlation ρ .
o We are interested in the distribution of y give that Z exceeds a particular value
If y & Z are positively correlated, then the truncation of Z should push the distribution
of Y to the right.
o The truncated joint density of y and Z is
( , )( , )
Prob( )
f y Zf y Z Z a
Z a
> =
>
For the bivariate normal distribution:
Theorem: If y and Z have a bivariate normal distribution with mean yµ and Zµ , standard
deviations yσ and Zσ and correlation ρ , then:
Advanced Econometrics - Part II Chapter 5: Limited - Dependent Variable Models
Nam T. Hoang
UNE Business School 8 University of New England
2 2
( ) ( )
( ) [1 ( )]
y y Z
y Z
E y Z a
Var y Z a
µ ρσ λ α
σ ρ δ α
> = +
> = −
Where:
−=
Φ−
=
−=
])()[()(
)](1[
)()(
)(
zzzz
z
z
z
ZZz a
ααλαλαδ
α
αφ
αλ
σµα
If the truncation is
)(
)()(
Z
Z
ZaZ α
αφ
αλ
Φ
−
=→<
For the standard bivariate normal:
),1,1,0,0(~),( ρNZy
aaZyE ρ== )(
21)( ρ−== aZyV
)(
)()(
a
aaZyE
Φ
−=<
φρ
)(1
)()(
a
aaZyE
Φ−
=>
φ
ρ
)(1)( 2 aaZyVar δρ−=>
General case: Let ),(~ ∑µNy and partition y, µ and ∑ into:
=
2
1
y
y
y
=
2
1
µ
µ
µ ,
∑∑
∑∑
=∑
2221
1211
Then the marginal distribution of 1y is ),( 111 ∑µN , ),(~ 2222 ∑µNy .
Conditional distribution of 21 yy is:
]),([~ 21
1
22121122
1
2212121 ∑∑∑−∑−∑∑+
−− µµ yNyy
8. The Sample Selection Model:
a) Wage equation:
iii uWZ += γ
*
*
iZ : difference between a person’s market wage and her reservation wage, the wage
rate necessary to make her choose to participate in the labour for
0* >iZ participate
Advanced Econometrics - Part II Chapter 5: Limited - Dependent Variable Models
Nam T. Hoang
UNE Business School 9 University of New England
0≤iZ do not participate
iW : education, age,
b) Hours equation
i i iY X β ε= +
iY : number of hours supplied
iX : wage # children, marital status.
iY is observed only when 0* >iZ . Suppose ii u&ε have a bivariate normal
distribution with zero mean and correlation ρ .
( )i iE Y Y is observed )0(
* >= ii ZYE
)( γiii WuYE −>=
)( γεβ iiii WuEX −>+=
)()( uiiuii XX αλββαλρσβ λε +=+=
Where: uiu W σγα −=
( )
( )ui
ui
u W
W
σγ
σγφ
αλ
Φ
=)(
iiiii vZYEZY +>=> )0(0
** iuii vuX ++= )(λββ λ
ελ ρσβ =
OLS estimation produces inconsistent estimates of β because of the omitting of
relevant variable )( ui αλ . Even if iλ were observed, the OLS would be inefficient. The
disturbance iv is heteroskedasticity.
We reformulate the model as follow:
*
* model1 0
0
i i i
i
i
Z W u
biary choiceif Z
Z
otherwise
γ = +
>
=
Prob( 1 ) ( )i i iZ W W γ→ = = Φ
Prob( 0 ) 1 ( )i i iZ W W γ= = −Φ
Regression model:
εβ += ii XY , observed only if 1=iZ
~),( iiu ε bivariate normal
εεε
ε
σσσµµ
ρσ
uuuu ,,,
],,1,0,0[
Advanced Econometrics - Part II Chapter 5: Limited - Dependent Variable Models
Nam T. Hoang
UNE Business School 10 University of New England
Suppose that, as in many of these studies, ii WZ & are observed for a random sample
of individuals but iY is observed when only 1=iZ .
)(],,1[ γλρσβ ε iiiiii WXWXZYE +==→
c) Estimation
The parameters of the sample selection model can be estimated by maximum
likelihood estimation. However Heckman’s (1979) two-step estimation procedure is
usually used instead:
o Estimate the probit equation by MLE to obtain estimates of γ . For each
observation in the selected sample, compute
)ˆ()ˆ(ˆ γγφλ iii WW Φ= and )ˆˆ(ˆˆ γλλδ iiii W+=
o 2. Estimate β and ελ ρσβ = by least-squares regression of Y and λˆ&X .
o Asymptotic covariance matrix of ]ˆ,ˆ[ λββ :
iiiiiii vXWXZY ++== λρσβ ε),,1(
Heteroskedasticity:
)1(],,1[ 221 iiii WXZvVar δρσε −==
Let ]ˆ,ˆ[* λβββ = , ],[
*
iii XX λ=
1'*'*2*'1'*'2* ]][)([][)( −− ∆−= XXXIXXXVarCov ρσβ ε
Where ∆− 2ρI is a diagonal matrix with )( 2 iI δρ− on the diagonal.
∑=+= in
pee
n
δδβδσ λε ˆ
1lim;ˆˆ1ˆ 2'2
*
*
*
ˆ
ˆ
ˆ
ε
λ
σ
βρ =
d) Model:
*
1 1 2
1 *
2 2
0 [ , ]
0 0
i i iX Y X X WY
Y Z Y
β ε + > ==
< =
22
*
2 εβ += XY
Assume ),0(~
2
1 ∑
= N
ε
ε
ε
With
=∑
112
1211
σ
σσ
Advanced Econometrics - Part II Chapter 5: Limited - Dependent Variable Models
Nam T. Hoang
UNE Business School 11 University of New England
Heckman’s two steps estimation:
In the subsample for which 01 ≠Y we have
* *1 2 1 1 2( 0) ( 0)i i i i iE Y Y X E Yβ ε> = + >
)( 2211 βεεβ iiii XEX −>+=
)(1 iiiX αλρσβ +=
Where
2
2
σ
βα ii
X
−=
Therefore, in the subsample for which 01 ≠Y
* *1 2 1 20 ( 0)i i i i iY Y E Y Y v> = > +
iiii vX ++= )(1 αλρσβ
iii vX ++= λββ λ
This is a proper regression equation in the sense that:
0)0,,( *2 =>iiii YxvE λ
Note that: 12
1
σρ
σ
=
Regression of 1Y on X is subject to the omitted variable bias.
o Heckman’s two steps estimation: (Heckit) procedure
1. Estimate the probit equation by MLE to get 2βˆ . Use this estimates to construct:
)ˆ(1
)ˆ(ˆ
2
2
β
βφλ
i
i
i X
X
−Φ−
−
=
2. Regress 1iY on iX and iλˆ
o Maximum likelihood:
There are two data regimes: 02 =Y and 12 =Y .
Construct the Likelihood Function:
Regime What is known about ε
1 0, 21 =YobservednotY 22 βε X−<
2 1, 21 =YobservedY 22111 , βεβε XXY −>−=
Regime 1: likelihood element:
Advanced Econometrics - Part II Chapter 5: Limited - Dependent Variable Models
Nam T. Hoang
UNE Business School 12 University of New England
∫
−
∞−
2
22 )(
β
εε
X
df ( ))( 2βX−Φ
Regime 2:
∫
+∞
−
−
2
2211 ),(
β
εεβ
X
dXYf
2
2
1 2 2 2 1 1 2 2
1 2
( , , ) ( ) ( , )
X
X
f d f Y X d
β
β
β β ε ε β ε ε
− +∞
−∞ −
∑ = −∏ ∏∫ ∫
V. THE DOUBLE SELECTION MODEL:
>>+
=
otherwise
bothorYandorYX
Y
0
0)(0 *3
*
211
1
εβ
+=
+=
33
*
3
22
*
2
εβ
εβ
XY
XY
VI. REGRESSION ANALYSIS OF TREATMENT EFFECTS:
iiii CXE εδβ ++=
iC is a dummy variable indicating whether or not the individual attended college.
Does δ measure the value of a college education?
(Assume the rest of the regression model is correctly specified)
The answer is no
If the typical individual who chooses to go to college would have relatively high earnings
whether or not he or she went to college The problem is one of seft-selection (sample
selection).
δ will overestimate the treatment effect.
Other settings in which the individuals themselves decide whether or not they will receive
the treatment.
iii uWC += γ
*
=
>=
otherwiseC
CifC
i
ii
1
01 *
),,1(),,1( iiiiiiiii ZXCEXZXCYE =++== εδβ
)( γλρσδβ ε ii WX −++=
estimate this model using the two-step estimator. For non-paticipate:
Advanced Econometrics - Part II Chapter 5: Limited - Dependent Variable Models
Nam T. Hoang
UNE Business School 13 University of New England
Φ−
−
+==
)(1
)(),,0(
γ
γφρσβ ε
i
i
iiiii W
WXZXCYE
The difference in expected earings between participants and non-participant is then:
Φ−Φ
+==−=
)1(
),,0(),,1(
ii
iiiiiiii ZXCYEZXCYE
φρσδ ε
δ least square overestimate the effect.
Các file đính kèm theo tài liệu này:
- chapter_05_limited_dependent_variable_models_1271_9546.pdf