Tài liệu Advanced Econometrics - Part II - Chapter 6: Models for count data: Advanced Econometrics - Part II Chapter 6: Models for count data
Nam T. Hoang
UNE Business School 1 University of New England
Chapter 6
MODELS FOR COUNT DATA
A count variable is a variable that takes on non-negative integer values:
• There is no natural upper bound
• The outcome will be zero for at least some members of the population
Y is count variable, X is a vector of explanatory variables. It is better to model )( XYE
directly and to choose functional forms that ensure possibility for any value of X and any
parameter value.
When Y has no upper bound, the most popular of these is the exponential function
)exp()( βXXYE =
I. POISSON REGRESSION MODEL:
• The basic Poisson regression model assumes that Y given ),...,,( 21 kXXXX = has a
Poisson distribution.
• The Poisson regression model specifies that each iY is drawn from a Poisson
distribution with parameter iλ , which is related to the regressor iX .
!
)(Pr
i
Y
i
ii Y
eXYYob
iiλλ−
== ( ! 1 ...
7 trang |
Chia sẻ: honghanh66 | Lượt xem: 696 | Lượt tải: 0
Bạn đang xem nội dung tài liệu Advanced Econometrics - Part II - Chapter 6: Models for count data, để tải tài liệu về máy bạn click vào nút DOWNLOAD ở trên
Advanced Econometrics - Part II Chapter 6: Models for count data
Nam T. Hoang
UNE Business School 1 University of New England
Chapter 6
MODELS FOR COUNT DATA
A count variable is a variable that takes on non-negative integer values:
• There is no natural upper bound
• The outcome will be zero for at least some members of the population
Y is count variable, X is a vector of explanatory variables. It is better to model )( XYE
directly and to choose functional forms that ensure possibility for any value of X and any
parameter value.
When Y has no upper bound, the most popular of these is the exponential function
)exp()( βXXYE =
I. POISSON REGRESSION MODEL:
• The basic Poisson regression model assumes that Y given ),...,,( 21 kXXXX = has a
Poisson distribution.
• The Poisson regression model specifies that each iY is drawn from a Poisson
distribution with parameter iλ , which is related to the regressor iX .
!
)(Pr
i
Y
i
ii Y
eXYYob
iiλλ−
== ( ! 1 2 ... )i iY Y= × × ×
iλ and iX are related as: βλ ii X=ln or
βλ iXi e=
The expected number of events is given by: βλ iXiiiii eXYVarXYE === ][][
(Poisson distribution properties)
So: βλi
i
ii
X
XYE
=
∂
∂ ][
• With the parameter estimate in hand, this vector can be computed using any data vector
desired.
• In principle, the Poisson model is simply a non-linear regression, but it is easier to
estimate the parameters with maximum likelihood techniques. The log-likelihood
function is:
Advanced Econometrics - Part II Chapter 6: Models for count data
Nam T. Hoang
UNE Business School 2 University of New England
i
n n
X
i
i 1 i 1
L ln [- ( ) ln( !)] [- ( ) ln( !)]i i i i iY X Y e Y X Y
βλ β β
= =
= = + − = + −∑ ∑
The likelihood equations are:
∑
=
=−=
∂
∂ n
i
iii XY
L
1
0)(ln λ
β
∑
=
=−=
n
i
i
X
i XeY i
1
0)( β
The Hessian is:
∑
=
−=
∂∂
∂ n
i
iii XX
L
1
'
'
2 ln λ
ββ
The Hessian is negative definite for all X and β . Newton-Raphson method is a simple
algorithm for this model and will converge rapidly. At convergence
∑
=
n
i
iii XX
1
'λˆ is an
estimator of the asymptotic covariance matrix for β . )ˆexp(ˆ βλ ii X= .
iλˆ is the prediction for observation )ˆexp(ˆ βλ ii Xi =→ estimated variance of iλˆ will be
iii VXX
'2ˆλ , where V is the estimated asymptotic covariance matrix for βˆ ,
1
1
'ˆ
−
=
= ∑
n
i
iii XXV λ
II. GOODNESS OF FIT:
2
1
2
1
2
ˆ
ˆ
1
∑
∑
=
=
−
−
−=
n
i
i
n
i i
ii
Y
YY
Y
R
λ
λ
This measure compares the fit of the model with a model of only one constant term.
Note: iY is integer, the prediction
βλ ˆˆ iXi e= is continuous.
III. OVERDISPERSION:
Poisson model has been criticized because of its implicit assumption that the variance of
iY equals it's mean. Many extensions of Poisson model that relax this assumption have
been proposed.
Test for over dispersion:
Advanced Econometrics - Part II Chapter 6: Models for count data
Nam T. Hoang
UNE Business School 3 University of New England
][][: iio YEYVarH =
])[(][][: iiiA YEgYEYVarH α+=
Regress:
2ˆ
)ˆ( 2
i
iii
i
YYZ
λ
λ −−
=
Most of count model with overdispersion (variance exceeds the mean) specify
overdispersion to be the form:
])[(][][ iiii YEgYEXYVar α+=
Where α is unknown parameter, g(.) is a known function most commonly ,)( 2µµ =g
or µµ =)(g .
Test: 0: =αoH
0: ≠αAH (or )0>α
Can be carried out by running the regression:
i
i
i
i
iii ugYY +=−−
λ
λα
λ
λ
ˆ
)ˆ(
ˆ
)ˆ( 2
Where iu is an error term. The reported t-statistic for α is asymptotically normal under
0: =αoH (Cameron & Trivedi 1990). This test can be also used for underdispersion
0<α , in which case the conditional variance is less than the conditional mean.
Conditional mean & variance iµ of the Poisson distribution, suppose now that the
parameter is random rather than being a completely deterministic function of regressor
iX .
Let: iii uλµ = iiiii Xu εβλµ +=+=→ lnlnln
βiX iε
This distribution of iY conditioned on iX and )( iiu ε remain Poisson with conditional
mean and variance iµ :
!
)(),(
i
Y
ii
u
iii Y
ueuXYf
iii λλ−
=
Prob( , )i i iY Y X u= =
∫
∞ −
==→
0
)(
!
)()(Pr ii
i
Y
ii
u
ii duugY
ueXYYob
iii λλ
Advanced Econometrics - Part II Chapter 6: Models for count data
Nam T. Hoang
UNE Business School 4 University of New England
)( iug is density function of iu
The choice of )( iug defines the unconditional distribution. For mathematical convenience, a
gamma distribution is usually assumed for )( iiu
εε= . Assume 1)( =iuE (for iiiE λλµ =)( ).
1
)(
)( −−
Γ
=→ θθ
θ
θ
θ
i
u
i ueug i
This density function for iY is then
1
0
( )
Prob( ) ( )
! ( )
i i i iu Y u
i i i
i i i i i
i
e u u e
Y Y X f Y X du
Y
λ θθ θλ θ
θ
∞ − −−
= = =
Γ∫
θ
θ
θ )1(
)()1(
)(
i
Y
i
i rr
Y
Y
i −
Γ+Γ
+Γ where
θλ
λ
+
=
i
i
ir
IV. NEGATIVE BINOMIAL REGRESSION MODEL:
• The assumed equality of the conditional mean and variance is the major shortenings of
the Poisson model.
• We generalize the Poisson model by introducing an individual unobserved effect into the
conditional mean.
• Suppose now that the conditional mean & variance iµ of the Poisson distribution is
random rather than being completely deterministic function of X (Because of unobserved
heterogeneity different obs may have different iµ . iλ is an parameter of Poisson but part of
this difference is due to a random (unobserved) component iu not only because of iX iµ
is just a parameter of distribution we want βµ iXi eE =)( we don’t want
βµ iXi e=
Let: iii uλµ =
Where βλ iXi e=
iiiii Xu εβλµ +=+=→ lnlnln
The disturbance iε reflects cross-sectional heterogeneity that normally characterizes
micro-economic data.
The distribution of iY conditional on iX and iu remain Poisson with conditional
mean & variance iµ :
( )
Prob( , )
!
i i iu Y
i i
i i i
i
e u
Y Y X u
Y
λ λ−
= = ),( iii uXYf=
Advanced Econometrics - Part II Chapter 6: Models for count data
Nam T. Hoang
UNE Business School 5 University of New England
The unconditional distribution )(Pr ii XYYob = is the expected value over iu of
),( iii uXYf .
0
( )
Prob( ) ( )
!
i i iu Y
i i
i i i i
i
e u
Y Y X g u du
Y
λ λ∞ −
= = ∫
)( iug is a density function of iu problem: the choice of )( iug ?
For mathematical convenience, a gamma distribution is usually assumed for iu .
Assume 1)( =iuE (for iiiE λλµ =)( ).
1
)(
)( −−
Γ
=→ θθ
θ
θ
θ
i
u
i ueug i
Then:
1
0
( )
Prob( )
! ( )
i i i iu Y u
i i i
i i i
i
e u u e
Y Y X du
Y
λ θθ θλ θ
θ
∞ − −−
= =
Γ∫
i
Y
i
u
i
Y
i duue
Y
iii
i
1
0
)(
)()1(
−+
∞
+−∫Γ+Γ=
θθλ
θ
θ
λθ
i
i
Y
ii
i
Y
i
Y
Y
++Γ+Γ
+Γ
= θ
θ
θλθ
θλθ
))(()1(
)(
ii i
Y
i
i
i rr
Y
Y θ
θ
θ )1(
)()1(
)(
−
Γ+Γ
+Γ
=
where
θλ
λ
+
=
i
i
ir
This is the form of the negative binomial distribution the distribution has conditional
mean iλ and conditional variance
+ ii λθ
λ 11
[ ] βλ iXiii eXYE ==→
[ ]
+== iiiii XYVar λθ
λλ 11
Note: gamma function: 1
0
( ) P tP t e dt
∞
− −Γ = ∫
We have: )1()1()( −Γ−=Γ PPP
)!1()( −=Γ→ PP if P is a integer number gamma function is a generalization of the
factor function for non-integer values.
Advanced Econometrics - Part II Chapter 6: Models for count data
Nam T. Hoang
UNE Business School 6 University of New England
which is the form of the negative binomial distribution.
( ) iii XYE λ=
( )
+= iiii XYVar λθ
λ 11
Note: gamma distribution: 1
)(
)( −−
Γ
= Px
P
xe
P
xf λλ
If PxE =→= λ1)( 0,0,0 >>≥ Px λ
because λ
PxE =)( 2)( λ
PxV =
iiiE λλµ =)( if 1)( =iuE the interpretation as in the Poisson model
iii XYE λ=)(
)11()( iiii XYV λθ
λ
+=
This negative binomial model can be estimated by maximum likelihood without much
difficulty. A test of a the Poisson distribution is often carried out of testing the hypothesis
01 == θα using the Wald test.
βλ iXiii eXYE ==)(
+= iiii XYVar λθ
λ 11)(
The ratio of the variance to the mean now is 11 >
+
θ
λi , different for different
observations.
The log-likelihood:
1
ln ln ( ) ln ( 1) ln ( ) ln ln(1 )
n
i i i i i
i
L Y Y Y r rθ θ θ
=
= = Γ + − Γ + − Γ + + −∑
θλ
λ
+
=
i
i
ir ;
βλ iXi e=
can be estimated by MLE easily.
Application: ),,,,,1( InsuranceKidsIncomeEducationAgeX
ti
=
Doctor visits: count data models
Advanced Econometrics - Part II Chapter 6: Models for count data
Nam T. Hoang
UNE Business School 7 University of New England
V. TOO MANY ZEROS DATA:
In many data sets, there is large number of zero counts. Assuming Poisson or negative
binomial is then a misspecification. Alternative is the zero-Inflated Poisson model.
• A binary probability model determines whether a zero or a nonzero outcome occurs then.
• A truncated Poisson distribution describes the positive outcomes.
Prob( 0 )i iY X e
θ−= =
(1 )
Prob( )
!(1 )
i
i
j
i
i i
e e
Y j X
j e
λθ
λ
λ−−
−
−
= =
−
Prob( 1 ) ( , )i i iZ W F W γ= =
Prob( , 1)
!
i j
i
i i i
e
Y j X Z
j
λ λ−
= = =
=>×−+×= 0,[)1(0)( ** iiii YXYEFFXYE
ie
F i λ
λ
−−
−=
1
)1(
Where *Y denote the outcome of the Poisson process in the regime 2.
Prob( 0 ) Prob( 1) Prob( 0 , 2)*Prob( 2)i i iY X regime Y X regime regime= = + =
Prob( ) Prob( , 2)*Prob( 2)i i i iY j X Y j X regime regime= = =
Các file đính kèm theo tài liệu này:
- chapter_06_models_for_count_data_2747_3207.pdf