Tài liệu Ứng dụng mô hình hồi quy logit nhị thức để xác định các yếu tố ảnh hưởng đến tai nạn giao thông ở thành phố Hồ Chí Minh: TẠP CHÍ KHOA HỌC CÔNG NGHỆ GIAO THÔNG VẬN TẢI, SỐ 21-11/2016
21
DETERMINING THE CONTRIBUTING FACTORS TO TRAFFIC
ACCIDENT IN HO CHI MINH CITY USING BINARY LOGIT
MODEL
ỨNG DỤNG MÔ HÌNH HỒI QUY LOGIT NHỊ THỨC ĐỂ XÁC ĐỊNH CÁC YẾU
TỐ ẢNH HƯỞNG ĐẾN TAI NẠN GIAO THÔNG Ở THÀNH PHỐ HỒ CHÍ MINH
Tran Quang Vuong
University of Transport and Communications Campus in Ho Chi Minh City
Abstract: Traffic accident patterns, the severity level and the factors determination to the
accidents have been investigated in this research. The results might be helpful for effective measures
suggestion to improve traffic safety at signalized intersections. A case study is conducted in Ho Chi
Minh City (HCMC), Vietnam. Historical traffic accident data in the city are collected during five
years (2011-2015). Binary logit models have been used to identify contributing factors to serious
traffic accident. The results show that the involvement of intersection type, land u...
5 trang |
Chia sẻ: quangot475 | Lượt xem: 348 | Lượt tải: 0
Bạn đang xem nội dung tài liệu Ứng dụng mô hình hồi quy logit nhị thức để xác định các yếu tố ảnh hưởng đến tai nạn giao thông ở thành phố Hồ Chí Minh, để tải tài liệu về máy bạn click vào nút DOWNLOAD ở trên
TẠP CHÍ KHOA HỌC CÔNG NGHỆ GIAO THÔNG VẬN TẢI, SỐ 21-11/2016
21
DETERMINING THE CONTRIBUTING FACTORS TO TRAFFIC
ACCIDENT IN HO CHI MINH CITY USING BINARY LOGIT
MODEL
ỨNG DỤNG MÔ HÌNH HỒI QUY LOGIT NHỊ THỨC ĐỂ XÁC ĐỊNH CÁC YẾU
TỐ ẢNH HƯỞNG ĐẾN TAI NẠN GIAO THÔNG Ở THÀNH PHỐ HỒ CHÍ MINH
Tran Quang Vuong
University of Transport and Communications Campus in Ho Chi Minh City
Abstract: Traffic accident patterns, the severity level and the factors determination to the
accidents have been investigated in this research. The results might be helpful for effective measures
suggestion to improve traffic safety at signalized intersections. A case study is conducted in Ho Chi
Minh City (HCMC), Vietnam. Historical traffic accident data in the city are collected during five
years (2011-2015). Binary logit models have been used to identify contributing factors to serious
traffic accident. The results show that the involvement of intersection type, land use and road type are
contributing factors to the accident severity. Based on the findings, strategies and measures for safety
improvement are formulated and discussed.
Keywords: Road traffic accident, signalized intersection, logit model, traffic safety measures,
factor analysis
Tóm tắt: Nghiên cứu này tập trung phân tích đặc điểm tai nạn, mức độ nghiêm trọng và các yếu
tố ảnh hưởng đến tai nạn. Kết quả nghiên cứu sẽ là căn cứ rất hữu ích để đề xuất các giải pháp hiệu
quả nhằm nâng cao an toàn giao thông tại các nút giao thông có đèn tín hiệu. Nghiên cứu này được
thực hiện cho trường hợp ở Thành phố Hồ Chí Minh, Việt Nam, dựa trên dữ liệu thống kê về tai nạn
giao thông trong 5 năm (2011-2015). Mô hình hồi quy logit nhị thức được sử dụng để xác định các
yếu tố ảnh hưởng đến tai nạn giao thông nghiêm trọng. Kết quả phân tích cho thấy loại nút giao
thông, vị trí nút giao và loại đường là những yếu tố ảnh hưởng đến mức độ nghiêm trọng của tai nạn.
Dựa vào kết quả nghiên cứu này để xuất các chính sách, giải pháp nhằm nâng cao an toàn giao thông.
Từ khóa: Tai nạn giao thông đường bộ, nút giao thông có đèn tín hiệu, mô hình logit, giải pháp
an toàn giao thông, phân tích yếu tố.
1. Introduction
Nearly 25 percent of all fatal crashes
occur at intersections and about 30 percent of
those are at intersections controlled by
signals. In 2015, the number of traffic
accident, fatalities, injuries which occurrence
in HCMC, have been slightly decreased
accounted for 3,694 (accidents); 693
(fatalities) and 3,301 (injuries). Although,
this showed that comparison with 2014, the
number of traffic accident, fatalities and
injuries in 2015 have slightly reduced,
accounted for 14.51%, 4.15% and 18.07%,
respectively, these increase at signalized
intersections in HCMC accounted for 41% of
total accident occurrence at intersections
(9.7% of total traffic accident in HCMC).
Until now, there is lack of empirical research
about traffic safety for signalized
intersections under mixed traffic conditions
since most of previous research on this topic
focusing on vehicle dominance. To address
the road traffic accident problems, it is
necessary to deeply understand contributing
factors to traffic accident. The objectives of
this research are to investigate traffic accident
patterns, the severity levels and contributing
factors to traffic accidents. This study aims,
however, at exploring not all contributing
factors, since substantial limitations in data
obtained from accident reports. Logistic
regression was used in this study to estimate
the effect of the significant contributing
factors to accident severity.
This paper are divided into five parts,
introduction is the first, the second is
literature review, descriptive analysis and
modelling are the third and fourth,
respectively and the last is discussions.
2. Literature review
Many various models have been
developed to determine contributing factors
to accident severity for both developed and
22
Journal of Transportation Science and Technology, Vol 21, Nov 2016
developing cities, such as Poison and
Negative Binomial model (Hoong et al.,
2001, Lin et al, 2003, Yinhai et al., 2004,
Huang et al., 2008); ordered probit model
(Abdel-Aty et al., 2003, 2005, Yu Jin et al.,
2010); logistic regression models (Hilakivi et
al., 1989, James and Kim, 1996, Mercier et
al., 1997, Al-Ghamdi, 2002, Kelvin, 2004);
Multiple logistic regression (Shankar and
Mannering, 1996, Carson and Mannering,
2001, Yan et al., 2005) and binary logistic
model was developed by many researchers.
Logistic modeling technique is often
preferred by researchers, due to the logistic
function must lie in the range between 0 and
1, and this is not usually the case with other
possible functions (Kleinbaum and Klein,
2002).
In summary, there have been numerous
studies to determine contributing factors
effect on accident severity by developing
logistic regression models. Nevertheless, only
limited studies explored crash injury severity
at signalized intersections (Abdel - Aty,
2003; Abdel - Aty and Keller, 2005; Yan et
al., 2005; Huang et al., 2008; Yu Jin et al.,
2010). Time of day, intersections type, nature
of lane, street lighting, presence of the red
light camera, pedestrian involved, vehicle
type, driver age and accident type are
variables which major contributing factors to
accident severity that learning from literature
review. Moreover, there is no study
investigating contributing factors to accident
severity by using logistic models at
signalized intersections in Vietnam in general
and in HCMC in particular. Based on
literature review combination with historical
traffic accident data which is available in
Vietnam condition, binary logistic model can
be applied for this case with highly
appropriation.
3. Descriptive analysis of traffic
accident at signalized intersections in
HCMC
3.1. Overview of HCMC
Acording to master plan, HCMC is
divided into three zones. City centre (zone 1)
includes 13 urban districts - 1, 3, 4, 5, 6, 8,
10, 11, Go Vap, Tan Binh, Tan Phu, Binh
Thanh, and Phu Nhuan. Newly developed
areas (zone 2) include 6 newly developed
districts - 2, 7, 9, 12, Binh Tan, and Thu Duc.
Rural areas (Zone 3) include 5 rural districts -
Hoc Mon, Nha Be, Can Gio, Cu Chi and
Binh Chanh, Fig.1.
Figure 1. Classification zone in HCMC.
3.2. Data collection
This research has been carried out based
on the historical accident database during five
years (2011 - 2015), obtained from the Rail-
Road Traffic Police Bureau in HCMC. The
traffic accident information was recorded in
accordance with form No. 02/TNDB with
nearly 60 categorizes information. In fact,
nevertheless, accident information just only
could be recorded 17 categorizes information
which were conducted for analyzing to
determine significant contributing factors to
accident severity.
3.3. Analysis of the patterns
There were 375 traffic accidents which
happened at signalized intersections in
HCMC during five years (2011 - 2015). The
number of traffic accident was distributed
different between three zones, with 212
(56.5%) accidents occurrence in zone 1, 126
(33.6%) in zone 2, and 37 (9.9%) in zone 3.
However, the rate between the number of
traffic accident and the total signalized
intersections in zone 2 is highest (0.79),
following by zone 1(0.44), and the less in
zone 3(0.36).
3.3.1. Distribution by time
The traffic accident trends slightly
increasing on holidays, tet holidays, at the
weekend and at the end of months in year.
The time of traffic accident occurrence is
Zone 1
Zone 2
Zone 3
TẠP CHÍ KHOA HỌC CÔNG NGHỆ GIAO THÔNG VẬN TẢI, SỐ 21-11/2016
23
difference in three zones, in zone 1 most of
the traffic accident happened in night off-
peak hour from 8PM to 4AM, while in zone
2, zone 3 it trends slightly increasing
morning, noon, and night peak hour (6AM -
8AM; 12AM - 2PM; 6PM - 8PM).
3.3.2. Distribution by road user
involvement accident
Most age group of road user involvement
traffic accident is 19 - 24 year - old (24%),
following by 25 - 30 year - old group (19%).
This age group accounted for 32%, 46%, and
22% in zone 1, zone 2 and zone 3,
respectively. This age group is not really
maturity, and irritated easily by alcohol. Male
road users are main group leading traffic
accident for three zones, which accounted for
77%, 88% and 78% in zone 1, zone 2 and
zone 3, respectively. The traffic accident
happening between motorcycle and
motorcycle or motorcycle and truck are
configuration type, which are the most
popular in zone 1 and in zone 2, 3 accounted
for 38%, 47%, respectively.
Red - light running, not accept priority,
wrong lane, illegal turning, and illegal
overtaking are significant causes leading to
traffic accident at signalized intersections. In
particular, red - light running, not accepted
priority are the most significant accident
cause in zone 1, and zone 2 accounted for
26%, 29%, respectively. Red - light running
and wrong lane are main causes in zone 3,
accounted for 35%.
4. Modelling of accident at signalized
intersections
4.1. Theoretical background of logistic
regression
In this research, accident severity is
considered dependent variable and
dichotomous type. It should pay attention that
the definition non - fatal accident mean any
accident happened without any fatal during
24 hours account from traffic accident
occurence and otherwise. Each accident in
time - series on road accident data was
categorized as either non - fatal or fatal. The
logistic model used is
(6)
And thus
P(fatal accident) = 1-P(non-fatal accident)
= 1- p(x) = 1/(1+eg(x)) (7)
Where g(x) stands for the function of the
independent variables:
g(x) = 0 + 1x1 + 2x2 + ...+nx (8)
Logistic regression determines the
coefficients that makes the observed outcome
(non - fatal or fatal accident) most likely
using the maximum - likelihood technique.
Principle estimation of this model is
based on probability value (P) equal 0.3, this
means, in case probability value is more than
and equal 0.3, that is fatal accident
occurrence, and otherwise.
4.2. List of variables
Since the research goal was to determine
the factors that might affect the severity of
the accident (i.e. whether it was a fatal or
none-fatal accident), 37 variables are
summarized from the time - series data,
accident patterns and they are coded under 0
and 1 to serve for developing model. Because
of discrete variables, correlation analysis
(Kendall’s tau-b test) was also used to reduce
the number of variables basing on the level of
correlation and P - value.
Table 1. Matrix coefficient correlation.
T
im
e
of
ac
ci
de
nt
D
ay
o
f
ac
ci
de
nt
M
on
th
o
f
ac
ci
de
nt
lo
ca
ti
on
U
rb
an
r
oa
d
P
ro
vi
nc
e
ro
ad
C
om
m
un
e
ro
ad
H
el
m
et
D
on
't
a
cc
ep
t
pr
io
ri
ty
Z
on
e2
W
id
th
pa
ve
m
en
t
<
3m
v
s
<
3m
Se
ve
ri
ty
o
f
ac
ci
de
nt
r 1.000
Sig.
r -.318
** 1.000
Sig. .000
r 1.000
**
-.318
** 1.000
Sig. .000
r -.318
**
1.000
**
-.318
** 1.000
Sig. .000 .000
r .637
**
.264
**
.637
**
.264
** 1.000
Sig. .000 .000 .000 .000
r .450
**
.207
**
.450
**
.207
**
.442
** 1.000
Sig. .000 .000 .000 .000 .000
r .540
**
.388
**
.540
**
.388
**
.662
**
.421
** 1.000
Sig. .000 .000 .000 .000 .000 .000
r -.039 .123
* -.039 .123
* .063 -.031 -.042 1.000
Sig. .448 .017 .448 .017 .226 .549 .412
r .591
**
.357
**
.591
**
.357
**
.643
**
.460
**
.670
** .061 1.000
Sig.
.000 .000 .000 .000 .000 .000 .000 .239
r .117
*
.182
**
.117
*
.182
**
.233
**
.262
**
.188
** -.013 .179
** 1.000
Sig. .024 .000 .024 .000 .000 .000 .000 .803 .001
r .345
**
.294
**
.345
**
.294
**
.484
**
.263
**
.679
** -.029 .391
**
.181
** 1.000
Sig.
.000 .000 .000 .000 .000 .000 .000 .577 .000 .000
r .155
**
.213
**
.155
**
.213
**
.261
**
.262
**
.282
**
.166
**
.252
**
.203
**
.159
** 1.000
Sig. .003 .000 .003 .000 .000 .000 .000 .001 .000 .000 .002
The number of samples (N)=375
r. Correlation Coefficient
Variables
**. Correlation is significant at the 0.01 level (2-tailed).
*. Correlation is significant at the 0.05 level (2-tailed).
Time of
accident
Day of
accident
Month of
accident
location
Urban road
Province
road
Commune
road
Helmet
Don't
accept
priority
Zone2
Width
pavement
<3m vs
<3m
Severity of
accident
4.3 Development of logistic model
The entry method of logistic regression
was followed using SPSS version 21. The
Omnibus tests of traffic accident severity
24
Journal of Transportation Science and Technology, Vol 21, Nov 2016
model coefficients is analyzed to assess
whether data fit the model or not as
illustration in Table 2.
Table 2. Omnibus Tests of Model Coefficients.
Chi-square df Sig.
Step 47.081 5 .000
Block 47.081 5 .000
Model 47.081 5 .000
Step 1
The specified model is significant (Sig <
0.05), hence it is recommended that the
independent variables improve on the
predictive power of the null model.
Table 3 contains the two pseudo R2
measures that are Cox - Snell and
Nagelkerke. Cox and Snell’s R-square
attempts to imitate multiple R - square based
on ‘likelihood’, but its maximum can be (and
usually is) less than 1.0, making it difficult to
interpret. Here it is indicating that 11.8% of
the variation is explained by the logistic
model.
The Nagelkerke modification that does
range from 0 to 1 is a more reliable measure
of the relationship. Nagelkerke’s R2 will
normally be higher than the Cox and Snell
measure. In this case it is 0.263 indicating the
relationship of 26.3% between the predictors
and the prediction. In addition, in Table 3
Hosmer - Lemeshow (H - L) test illustrate the
significance of the developed logistic
regression models (sig. >0.05).
Table 3. Goodness of fit (Pseudo R2 and H-L Test).
-2 Log
likelihood
Cox & Snell R
Square
Nagelkerke R
Square
1 176.333
a .118 .263
Step Chi-square df Sig.
1 5.739 4 .220
Pseudo R2 Test
Step
a. Estimation terminated at iteration number 7 because
parameter estimates changed by less than .001.
Hosmer and Lemeshow Test
Our H - L statistic has a significance of 0.22
which means that it is not statistically
significant and therefore our model is quite
good fit. Rather than using a goodness – of -
fit statistic, we often want to look at the
proportion of cases we have managed to
classify correctly. In a perfect model, the
overall percent correct will be 100% for all
cases. In our study overall 88.3% were
correctly classified. Nevertheless, it trends
skew prediction for non - fatal accident
(percentage correct 95%) while only 18.2% is
percentage correct for fatal accident
prediction. From Wald - value test at Table 4,
it appears that the variables loc, Uroad,
Proad, Croad and Zone 2, show some
significant effect (loc, Uroad, Proad, Croad
are about significant).
Table 4. The result of Wald test.
B S.E. Wald df Sig. Exp(B)
loc .770 .426 3.258 1 .049 2.159
Uroad 1.008 .522 3.727 1 .049 2.740
Proad .929 .415 5.020 1 .025 2.533
Croad 1.188 .563 4.451 1 .035 3.280
Zone2 .792 .541 2.143 1 .043 2.207
Consta
nt
-4.422 .558 62.782 1 .000 .012
Step
1
a
a. Variable(s) entered on step 1: loc, Uroad, Proad, Croad, Zone2.
According to the previous analysis, the
logit model with the significant variables is
as follows:
g(x) = - 4.422 + 0.77loc + 1.008Uroad +
0.929Proad + 1.188Croad + 0.792zone (9)
Hence the logistic regression model
developed in this study is
(x) = eg(x)/ (1+eg(x)), where g(x) in Eqs.(9)
4.4 Model interpretation
Interpretation of any models means the
ability to explain practical inferences from
the estimated coefficients. The estimated
coefficients for the independent variables
represent the trend or rate of change of the
dependent variables per unit of change in the
independent variable. The interpretation of
the model developed in this study are
presented in detailed, as follows
4.4.1. Impact of location on accident
severity
It should pay attention that due to ‘loc’
has two levels:
loc = 1 (fatal accident occurrence at
junction and the others).
loc = 0 (fatal accident occurrence at
intersections).
According to this coding, our model
shows loc in the logit model with the
coefficient of 0.77. To interpret this
parameter, the logit difference should be
computed as follows:
Logit (fatal accident/ junction & other)
=
Logit (fatal accident/ Intersection)
TẠP CHÍ KHOA HỌC CÔNG NGHỆ GIAO THÔNG VẬN TẢI, SỐ 21-11/2016
25
=
Logit difference
=
Hence the odds ratio is e1 =e0.77 = 2.16
This value shows that the odds of being
in a fatal accident at a junction and the others
location are 2.16 higher than those at an
intersection. By using the same method, we
can explain the zone 2 factor to impact on
accident severity easily, the odds of being in
a fatal accident happening in zone 2 are 2.2
(e0.792) higher than those occurrence related to
zone 1 and zone 3.
4.4.2. Impact of Uroad on accident
severity
2(1.008) measures the differential effect
on the logit of two cases, whether fatal
accident occurrence on urban road or not
To interpret this parameter, the logit
difference is computed first:
Logit (Fatal/Uroad)
For any other type of road:
Logit (Fatal/not Uroad)
=
Logit difference
=
Hence the odds ratio is e(-1.109) = 0.33
Thus, the odds that accident will be fatal,
in case it occurrences on urban road is 0.33
times its being fatal related to the other type
of road.
The similar method was used to compute
the odds for Proad and Croad, which account
for 0.28 and 0.47, respectively.
5. Conclusions
Logit model was developed in this study
in order to determine significant contributing
factors to accident severity in HCMC basing
on response variable which is binary nature
(i.e. has two categories – fatal or non-fatal)
with three variables namely, type of road,
location and land use. This model is
reasonable statistic fit with 88.3% overall
percentage, although it trend skew prediction
for non - fatal accident case (18.2%).
The findings might help the authorities in
HCMC should focus on improvement safety
at junctions in zone 2 where involve
commune road for their strategies. It also
help the authorities that should be pay
attention to make own safety policies for each
zone instead of for whole HCMC as they
have made before. This may make safety
policies more cost - effectively.
The odds presented in this paper can be
used to help establish priorities solutions to
reduce serious accident. Such as the odds of
being involved in a fatal accident at junctions
and other on commune road in zone 2, where
there is few policeman to control the traffic,
lack of traffic signs and drivers with low
safety awareness, are relatively higher than
those for other cases.
It is important should pay attention that,
some significant variables such as road
surface, traffic signal pattern, light condition,
collision type, license status and so on which
are not available or difficult to obtain in
HCMC condition. So they are not including
in this research. Nevertheless, the findings of
this study can be considered as guidance
methods for future study when these
variables are available
References
[1] Yau, K.K.W (2004), Risk factors affecting the severity of
single vehicle traffic accidents in Hong Kong.
Accident Analysis & Prevention.
[2] Abdel-Aty et al., (2005), Exploring the overall and
specific crash severity levels at signalized
intersections. Accident Analysis & Prevention.
[3] Yan, X. et al., (2005), Characteristics of rear-end
accidents at signalized intersections using multiple
logistic regression model. Accident Analysis &
Prevention.
[4] Huang, H. et al., (2008), Severity of driver injury and
vehicle damage in traffic crashes at intersections: A
Bayesian hierarchical analysis. Accident Analysis &
Prevention.
[5] Jin, Y., X. Wang, and X. Chen (2010). Right-angle
crash injury severity analysis using ordered probability
models. Intelligent Computation Technology and
Automation (ICICTA), IEEE.
Ngày nhận bài: 26/9/2016
Ngày chuyển phản biện: 30/9/2016
Ngày hoàn thành sửa bài: 21/10/2016
Ngày chấp nhận đăng: 28/10/2016
Các file đính kèm theo tài liệu này:
- 110_1_313_1_10_20170817_739_2202541.pdf