Ứng dụng mô hình hồi quy logit nhị thức để xác định các yếu tố ảnh hưởng đến tai nạn giao thông ở thành phố Hồ Chí Minh

Tài liệu Ứng dụng mô hình hồi quy logit nhị thức để xác định các yếu tố ảnh hưởng đến tai nạn giao thông ở thành phố Hồ Chí Minh: TẠP CHÍ KHOA HỌC CÔNG NGHỆ GIAO THÔNG VẬN TẢI, SỐ 21-11/2016 21 DETERMINING THE CONTRIBUTING FACTORS TO TRAFFIC ACCIDENT IN HO CHI MINH CITY USING BINARY LOGIT MODEL ỨNG DỤNG MÔ HÌNH HỒI QUY LOGIT NHỊ THỨC ĐỂ XÁC ĐỊNH CÁC YẾU TỐ ẢNH HƯỞNG ĐẾN TAI NẠN GIAO THÔNG Ở THÀNH PHỐ HỒ CHÍ MINH Tran Quang Vuong University of Transport and Communications Campus in Ho Chi Minh City Abstract: Traffic accident patterns, the severity level and the factors determination to the accidents have been investigated in this research. The results might be helpful for effective measures suggestion to improve traffic safety at signalized intersections. A case study is conducted in Ho Chi Minh City (HCMC), Vietnam. Historical traffic accident data in the city are collected during five years (2011-2015). Binary logit models have been used to identify contributing factors to serious traffic accident. The results show that the involvement of intersection type, land u...

pdf5 trang | Chia sẻ: quangot475 | Lượt xem: 348 | Lượt tải: 0download
Bạn đang xem nội dung tài liệu Ứng dụng mô hình hồi quy logit nhị thức để xác định các yếu tố ảnh hưởng đến tai nạn giao thông ở thành phố Hồ Chí Minh, để tải tài liệu về máy bạn click vào nút DOWNLOAD ở trên
TẠP CHÍ KHOA HỌC CÔNG NGHỆ GIAO THÔNG VẬN TẢI, SỐ 21-11/2016 21 DETERMINING THE CONTRIBUTING FACTORS TO TRAFFIC ACCIDENT IN HO CHI MINH CITY USING BINARY LOGIT MODEL ỨNG DỤNG MÔ HÌNH HỒI QUY LOGIT NHỊ THỨC ĐỂ XÁC ĐỊNH CÁC YẾU TỐ ẢNH HƯỞNG ĐẾN TAI NẠN GIAO THÔNG Ở THÀNH PHỐ HỒ CHÍ MINH Tran Quang Vuong University of Transport and Communications Campus in Ho Chi Minh City Abstract: Traffic accident patterns, the severity level and the factors determination to the accidents have been investigated in this research. The results might be helpful for effective measures suggestion to improve traffic safety at signalized intersections. A case study is conducted in Ho Chi Minh City (HCMC), Vietnam. Historical traffic accident data in the city are collected during five years (2011-2015). Binary logit models have been used to identify contributing factors to serious traffic accident. The results show that the involvement of intersection type, land use and road type are contributing factors to the accident severity. Based on the findings, strategies and measures for safety improvement are formulated and discussed. Keywords: Road traffic accident, signalized intersection, logit model, traffic safety measures, factor analysis Tóm tắt: Nghiên cứu này tập trung phân tích đặc điểm tai nạn, mức độ nghiêm trọng và các yếu tố ảnh hưởng đến tai nạn. Kết quả nghiên cứu sẽ là căn cứ rất hữu ích để đề xuất các giải pháp hiệu quả nhằm nâng cao an toàn giao thông tại các nút giao thông có đèn tín hiệu. Nghiên cứu này được thực hiện cho trường hợp ở Thành phố Hồ Chí Minh, Việt Nam, dựa trên dữ liệu thống kê về tai nạn giao thông trong 5 năm (2011-2015). Mô hình hồi quy logit nhị thức được sử dụng để xác định các yếu tố ảnh hưởng đến tai nạn giao thông nghiêm trọng. Kết quả phân tích cho thấy loại nút giao thông, vị trí nút giao và loại đường là những yếu tố ảnh hưởng đến mức độ nghiêm trọng của tai nạn. Dựa vào kết quả nghiên cứu này để xuất các chính sách, giải pháp nhằm nâng cao an toàn giao thông. Từ khóa: Tai nạn giao thông đường bộ, nút giao thông có đèn tín hiệu, mô hình logit, giải pháp an toàn giao thông, phân tích yếu tố. 1. Introduction Nearly 25 percent of all fatal crashes occur at intersections and about 30 percent of those are at intersections controlled by signals. In 2015, the number of traffic accident, fatalities, injuries which occurrence in HCMC, have been slightly decreased accounted for 3,694 (accidents); 693 (fatalities) and 3,301 (injuries). Although, this showed that comparison with 2014, the number of traffic accident, fatalities and injuries in 2015 have slightly reduced, accounted for 14.51%, 4.15% and 18.07%, respectively, these increase at signalized intersections in HCMC accounted for 41% of total accident occurrence at intersections (9.7% of total traffic accident in HCMC). Until now, there is lack of empirical research about traffic safety for signalized intersections under mixed traffic conditions since most of previous research on this topic focusing on vehicle dominance. To address the road traffic accident problems, it is necessary to deeply understand contributing factors to traffic accident. The objectives of this research are to investigate traffic accident patterns, the severity levels and contributing factors to traffic accidents. This study aims, however, at exploring not all contributing factors, since substantial limitations in data obtained from accident reports. Logistic regression was used in this study to estimate the effect of the significant contributing factors to accident severity. This paper are divided into five parts, introduction is the first, the second is literature review, descriptive analysis and modelling are the third and fourth, respectively and the last is discussions. 2. Literature review Many various models have been developed to determine contributing factors to accident severity for both developed and 22 Journal of Transportation Science and Technology, Vol 21, Nov 2016 developing cities, such as Poison and Negative Binomial model (Hoong et al., 2001, Lin et al, 2003, Yinhai et al., 2004, Huang et al., 2008); ordered probit model (Abdel-Aty et al., 2003, 2005, Yu Jin et al., 2010); logistic regression models (Hilakivi et al., 1989, James and Kim, 1996, Mercier et al., 1997, Al-Ghamdi, 2002, Kelvin, 2004); Multiple logistic regression (Shankar and Mannering, 1996, Carson and Mannering, 2001, Yan et al., 2005) and binary logistic model was developed by many researchers. Logistic modeling technique is often preferred by researchers, due to the logistic function must lie in the range between 0 and 1, and this is not usually the case with other possible functions (Kleinbaum and Klein, 2002). In summary, there have been numerous studies to determine contributing factors effect on accident severity by developing logistic regression models. Nevertheless, only limited studies explored crash injury severity at signalized intersections (Abdel - Aty, 2003; Abdel - Aty and Keller, 2005; Yan et al., 2005; Huang et al., 2008; Yu Jin et al., 2010). Time of day, intersections type, nature of lane, street lighting, presence of the red light camera, pedestrian involved, vehicle type, driver age and accident type are variables which major contributing factors to accident severity that learning from literature review. Moreover, there is no study investigating contributing factors to accident severity by using logistic models at signalized intersections in Vietnam in general and in HCMC in particular. Based on literature review combination with historical traffic accident data which is available in Vietnam condition, binary logistic model can be applied for this case with highly appropriation. 3. Descriptive analysis of traffic accident at signalized intersections in HCMC 3.1. Overview of HCMC Acording to master plan, HCMC is divided into three zones. City centre (zone 1) includes 13 urban districts - 1, 3, 4, 5, 6, 8, 10, 11, Go Vap, Tan Binh, Tan Phu, Binh Thanh, and Phu Nhuan. Newly developed areas (zone 2) include 6 newly developed districts - 2, 7, 9, 12, Binh Tan, and Thu Duc. Rural areas (Zone 3) include 5 rural districts - Hoc Mon, Nha Be, Can Gio, Cu Chi and Binh Chanh, Fig.1. Figure 1. Classification zone in HCMC. 3.2. Data collection This research has been carried out based on the historical accident database during five years (2011 - 2015), obtained from the Rail- Road Traffic Police Bureau in HCMC. The traffic accident information was recorded in accordance with form No. 02/TNDB with nearly 60 categorizes information. In fact, nevertheless, accident information just only could be recorded 17 categorizes information which were conducted for analyzing to determine significant contributing factors to accident severity. 3.3. Analysis of the patterns There were 375 traffic accidents which happened at signalized intersections in HCMC during five years (2011 - 2015). The number of traffic accident was distributed different between three zones, with 212 (56.5%) accidents occurrence in zone 1, 126 (33.6%) in zone 2, and 37 (9.9%) in zone 3. However, the rate between the number of traffic accident and the total signalized intersections in zone 2 is highest (0.79), following by zone 1(0.44), and the less in zone 3(0.36). 3.3.1. Distribution by time The traffic accident trends slightly increasing on holidays, tet holidays, at the weekend and at the end of months in year. The time of traffic accident occurrence is Zone 1 Zone 2 Zone 3 TẠP CHÍ KHOA HỌC CÔNG NGHỆ GIAO THÔNG VẬN TẢI, SỐ 21-11/2016 23 difference in three zones, in zone 1 most of the traffic accident happened in night off- peak hour from 8PM to 4AM, while in zone 2, zone 3 it trends slightly increasing morning, noon, and night peak hour (6AM - 8AM; 12AM - 2PM; 6PM - 8PM). 3.3.2. Distribution by road user involvement accident Most age group of road user involvement traffic accident is 19 - 24 year - old (24%), following by 25 - 30 year - old group (19%). This age group accounted for 32%, 46%, and 22% in zone 1, zone 2 and zone 3, respectively. This age group is not really maturity, and irritated easily by alcohol. Male road users are main group leading traffic accident for three zones, which accounted for 77%, 88% and 78% in zone 1, zone 2 and zone 3, respectively. The traffic accident happening between motorcycle and motorcycle or motorcycle and truck are configuration type, which are the most popular in zone 1 and in zone 2, 3 accounted for 38%, 47%, respectively. Red - light running, not accept priority, wrong lane, illegal turning, and illegal overtaking are significant causes leading to traffic accident at signalized intersections. In particular, red - light running, not accepted priority are the most significant accident cause in zone 1, and zone 2 accounted for 26%, 29%, respectively. Red - light running and wrong lane are main causes in zone 3, accounted for 35%. 4. Modelling of accident at signalized intersections 4.1. Theoretical background of logistic regression In this research, accident severity is considered dependent variable and dichotomous type. It should pay attention that the definition non - fatal accident mean any accident happened without any fatal during 24 hours account from traffic accident occurence and otherwise. Each accident in time - series on road accident data was categorized as either non - fatal or fatal. The logistic model used is (6) And thus P(fatal accident) = 1-P(non-fatal accident) = 1- p(x) = 1/(1+eg(x)) (7) Where g(x) stands for the function of the independent variables: g(x) = 0 + 1x1 + 2x2 + ...+nx (8) Logistic regression determines the coefficients that makes the observed outcome (non - fatal or fatal accident) most likely using the maximum - likelihood technique. Principle estimation of this model is based on probability value (P) equal 0.3, this means, in case probability value is more than and equal 0.3, that is fatal accident occurrence, and otherwise. 4.2. List of variables Since the research goal was to determine the factors that might affect the severity of the accident (i.e. whether it was a fatal or none-fatal accident), 37 variables are summarized from the time - series data, accident patterns and they are coded under 0 and 1 to serve for developing model. Because of discrete variables, correlation analysis (Kendall’s tau-b test) was also used to reduce the number of variables basing on the level of correlation and P - value. Table 1. Matrix coefficient correlation. T im e of ac ci de nt D ay o f ac ci de nt M on th o f ac ci de nt lo ca ti on U rb an r oa d P ro vi nc e ro ad C om m un e ro ad H el m et D on 't a cc ep t pr io ri ty Z on e2 W id th pa ve m en t < 3m v s < 3m Se ve ri ty o f ac ci de nt r 1.000 Sig. r -.318 ** 1.000 Sig. .000 r 1.000 ** -.318 ** 1.000 Sig. .000 r -.318 ** 1.000 ** -.318 ** 1.000 Sig. .000 .000 r .637 ** .264 ** .637 ** .264 ** 1.000 Sig. .000 .000 .000 .000 r .450 ** .207 ** .450 ** .207 ** .442 ** 1.000 Sig. .000 .000 .000 .000 .000 r .540 ** .388 ** .540 ** .388 ** .662 ** .421 ** 1.000 Sig. .000 .000 .000 .000 .000 .000 r -.039 .123 * -.039 .123 * .063 -.031 -.042 1.000 Sig. .448 .017 .448 .017 .226 .549 .412 r .591 ** .357 ** .591 ** .357 ** .643 ** .460 ** .670 ** .061 1.000 Sig. .000 .000 .000 .000 .000 .000 .000 .239 r .117 * .182 ** .117 * .182 ** .233 ** .262 ** .188 ** -.013 .179 ** 1.000 Sig. .024 .000 .024 .000 .000 .000 .000 .803 .001 r .345 ** .294 ** .345 ** .294 ** .484 ** .263 ** .679 ** -.029 .391 ** .181 ** 1.000 Sig. .000 .000 .000 .000 .000 .000 .000 .577 .000 .000 r .155 ** .213 ** .155 ** .213 ** .261 ** .262 ** .282 ** .166 ** .252 ** .203 ** .159 ** 1.000 Sig. .003 .000 .003 .000 .000 .000 .000 .001 .000 .000 .002 The number of samples (N)=375 r. Correlation Coefficient Variables **. Correlation is significant at the 0.01 level (2-tailed). *. Correlation is significant at the 0.05 level (2-tailed). Time of accident Day of accident Month of accident location Urban road Province road Commune road Helmet Don't accept priority Zone2 Width pavement <3m vs <3m Severity of accident 4.3 Development of logistic model The entry method of logistic regression was followed using SPSS version 21. The Omnibus tests of traffic accident severity 24 Journal of Transportation Science and Technology, Vol 21, Nov 2016 model coefficients is analyzed to assess whether data fit the model or not as illustration in Table 2. Table 2. Omnibus Tests of Model Coefficients. Chi-square df Sig. Step 47.081 5 .000 Block 47.081 5 .000 Model 47.081 5 .000 Step 1 The specified model is significant (Sig < 0.05), hence it is recommended that the independent variables improve on the predictive power of the null model. Table 3 contains the two pseudo R2 measures that are Cox - Snell and Nagelkerke. Cox and Snell’s R-square attempts to imitate multiple R - square based on ‘likelihood’, but its maximum can be (and usually is) less than 1.0, making it difficult to interpret. Here it is indicating that 11.8% of the variation is explained by the logistic model. The Nagelkerke modification that does range from 0 to 1 is a more reliable measure of the relationship. Nagelkerke’s R2 will normally be higher than the Cox and Snell measure. In this case it is 0.263 indicating the relationship of 26.3% between the predictors and the prediction. In addition, in Table 3 Hosmer - Lemeshow (H - L) test illustrate the significance of the developed logistic regression models (sig. >0.05). Table 3. Goodness of fit (Pseudo R2 and H-L Test). -2 Log likelihood Cox & Snell R Square Nagelkerke R Square 1 176.333 a .118 .263 Step Chi-square df Sig. 1 5.739 4 .220 Pseudo R2 Test Step a. Estimation terminated at iteration number 7 because parameter estimates changed by less than .001. Hosmer and Lemeshow Test Our H - L statistic has a significance of 0.22 which means that it is not statistically significant and therefore our model is quite good fit. Rather than using a goodness – of - fit statistic, we often want to look at the proportion of cases we have managed to classify correctly. In a perfect model, the overall percent correct will be 100% for all cases. In our study overall 88.3% were correctly classified. Nevertheless, it trends skew prediction for non - fatal accident (percentage correct 95%) while only 18.2% is percentage correct for fatal accident prediction. From Wald - value test at Table 4, it appears that the variables loc, Uroad, Proad, Croad and Zone 2, show some significant effect (loc, Uroad, Proad, Croad are about significant). Table 4. The result of Wald test. B S.E. Wald df Sig. Exp(B) loc .770 .426 3.258 1 .049 2.159 Uroad 1.008 .522 3.727 1 .049 2.740 Proad .929 .415 5.020 1 .025 2.533 Croad 1.188 .563 4.451 1 .035 3.280 Zone2 .792 .541 2.143 1 .043 2.207 Consta nt -4.422 .558 62.782 1 .000 .012 Step 1 a a. Variable(s) entered on step 1: loc, Uroad, Proad, Croad, Zone2. According to the previous analysis, the logit model with the significant variables is as follows: g(x) = - 4.422 + 0.77loc + 1.008Uroad + 0.929Proad + 1.188Croad + 0.792zone (9) Hence the logistic regression model developed in this study is (x) = eg(x)/ (1+eg(x)), where g(x) in Eqs.(9) 4.4 Model interpretation Interpretation of any models means the ability to explain practical inferences from the estimated coefficients. The estimated coefficients for the independent variables represent the trend or rate of change of the dependent variables per unit of change in the independent variable. The interpretation of the model developed in this study are presented in detailed, as follows 4.4.1. Impact of location on accident severity It should pay attention that due to ‘loc’ has two levels: loc = 1 (fatal accident occurrence at junction and the others). loc = 0 (fatal accident occurrence at intersections). According to this coding, our model shows loc in the logit model with the coefficient of 0.77. To interpret this parameter, the logit difference should be computed as follows: Logit (fatal accident/ junction & other) =  Logit (fatal accident/ Intersection) TẠP CHÍ KHOA HỌC CÔNG NGHỆ GIAO THÔNG VẬN TẢI, SỐ 21-11/2016 25 =  Logit difference =  Hence the odds ratio is e1 =e0.77 = 2.16 This value shows that the odds of being in a fatal accident at a junction and the others location are 2.16 higher than those at an intersection. By using the same method, we can explain the zone 2 factor to impact on accident severity easily, the odds of being in a fatal accident happening in zone 2 are 2.2 (e0.792) higher than those occurrence related to zone 1 and zone 3. 4.4.2. Impact of Uroad on accident severity 2(1.008) measures the differential effect on the logit of two cases, whether fatal accident occurrence on urban road or not To interpret this parameter, the logit difference is computed first: Logit (Fatal/Uroad)  For any other type of road: Logit (Fatal/not Uroad) = Logit difference =   Hence the odds ratio is e(-1.109) = 0.33 Thus, the odds that accident will be fatal, in case it occurrences on urban road is 0.33 times its being fatal related to the other type of road. The similar method was used to compute the odds for Proad and Croad, which account for 0.28 and 0.47, respectively. 5. Conclusions Logit model was developed in this study in order to determine significant contributing factors to accident severity in HCMC basing on response variable which is binary nature (i.e. has two categories – fatal or non-fatal) with three variables namely, type of road, location and land use. This model is reasonable statistic fit with 88.3% overall percentage, although it trend skew prediction for non - fatal accident case (18.2%). The findings might help the authorities in HCMC should focus on improvement safety at junctions in zone 2 where involve commune road for their strategies. It also help the authorities that should be pay attention to make own safety policies for each zone instead of for whole HCMC as they have made before. This may make safety policies more cost - effectively. The odds presented in this paper can be used to help establish priorities solutions to reduce serious accident. Such as the odds of being involved in a fatal accident at junctions and other on commune road in zone 2, where there is few policeman to control the traffic, lack of traffic signs and drivers with low safety awareness, are relatively higher than those for other cases. It is important should pay attention that, some significant variables such as road surface, traffic signal pattern, light condition, collision type, license status and so on which are not available or difficult to obtain in HCMC condition. So they are not including in this research. Nevertheless, the findings of this study can be considered as guidance methods for future study when these variables are available References [1] Yau, K.K.W (2004), Risk factors affecting the severity of single vehicle traffic accidents in Hong Kong. Accident Analysis & Prevention. [2] Abdel-Aty et al., (2005), Exploring the overall and specific crash severity levels at signalized intersections. Accident Analysis & Prevention. [3] Yan, X. et al., (2005), Characteristics of rear-end accidents at signalized intersections using multiple logistic regression model. Accident Analysis & Prevention. [4] Huang, H. et al., (2008), Severity of driver injury and vehicle damage in traffic crashes at intersections: A Bayesian hierarchical analysis. Accident Analysis & Prevention. [5] Jin, Y., X. Wang, and X. Chen (2010). Right-angle crash injury severity analysis using ordered probability models. Intelligent Computation Technology and Automation (ICICTA), IEEE. Ngày nhận bài: 26/9/2016 Ngày chuyển phản biện: 30/9/2016 Ngày hoàn thành sửa bài: 21/10/2016 Ngày chấp nhận đăng: 28/10/2016

Các file đính kèm theo tài liệu này:

  • pdf110_1_313_1_10_20170817_739_2202541.pdf
Tài liệu liên quan