Bài giảng Statistical Techniques in Business and Economics - Chapter 14 Multiple Regression

Tài liệu Bài giảng Statistical Techniques in Business and Economics - Chapter 14 Multiple Regression: Chapter 14Multiple RegressionWhen you have completed this chapter, you will be able to:Chapter Goals Understand the importance of an appropriate model specification and multiple regression analysis Comprehend the nature and technique of multiple regression models and the concept of partial regression coefficients. Use the estimation techniques for multiple regression models.and...1.2.3.4.14 Conduct an analysis of variance of an estimated modelChapter Goals Identify the problems raised, and the remedies thereof, by the presence of multicollinearity in the data sets6.7.8.14 Draw inferences about the assumed (true) model though a joint test of hypothesis (F test) on the coefficients of all variables Draw inferences about the importance of the independent variables through tests of hypothesis (t-tests)5. Explain the goodness of fit of an estimated model.and... Identify the problems raised, and the remedies thereof, by the presence of outliers/influential observations in th...

ppt31 trang | Chia sẻ: honghanh66 | Lượt xem: 567 | Lượt tải: 0download
Bạn đang xem trước 20 trang mẫu tài liệu Bài giảng Statistical Techniques in Business and Economics - Chapter 14 Multiple Regression, để tải tài liệu gốc về máy bạn click vào nút DOWNLOAD ở trên
Chapter 14Multiple RegressionWhen you have completed this chapter, you will be able to:Chapter Goals Understand the importance of an appropriate model specification and multiple regression analysis Comprehend the nature and technique of multiple regression models and the concept of partial regression coefficients. Use the estimation techniques for multiple regression models.and...1.2.3.4.14 Conduct an analysis of variance of an estimated modelChapter Goals Identify the problems raised, and the remedies thereof, by the presence of multicollinearity in the data sets6.7.8.14 Draw inferences about the assumed (true) model though a joint test of hypothesis (F test) on the coefficients of all variables Draw inferences about the importance of the independent variables through tests of hypothesis (t-tests)5. Explain the goodness of fit of an estimated model.and... Identify the problems raised, and the remedies thereof, by the presence of outliers/influential observations in the data sets9.Chapter Goals Comprehend the concept of partial correlations and its importance in multiple regression analysis11.12.14 Write a research report on an investigation using multiple regression analysis Use some simple remedial measures in the presence of violations of the model assumptions.13.10. Identify the violation of model assumptions, including linearity, homoscedasticity, autocorrelation, and normality through simple diagnosic procedures. Chapter Goals15.16.14 Apply some advanced diagnostic checks and remedies in multiple regression analysis Use qualitative variables, as well as their interactions with other independent variables through a joint test of hypothesis14. Draw inferences about the importance of a subset of the importance in multiple regression analysisFor two independent variables, the general form of the multiple regression equation is:x1 and x2 are the independent variables.a is the y-intercept.Multiple Regression Analysisb1 is the net change in y for each unit change inx1 holding x2 constant. It is called a partial regression coefficient, a net regression coefficient, or just a regression coefficient. yabxbx=++1122The general multiple regression with k independent variables is given by:The least squares criterion is used to develop this equation.Because determining b1, b2, etc. is very tedious, a software package such as Excel or MINITAB is recommended. Multiple Regression Analysisyabxbx=++1122bx+kk+. . . is measured in the same units as the dependent variable Multiple Standard Error of Estimate is a measure of the effectiveness of the regression equation it is difficult to determine what is a large value and what is a small value of the standard error! Multiple Regression and Correlation Assumptions the independent variables and the dependent variables have a linear relationship the dependent variable must be continuous and at least interval-scale the variation in (y - y) or residual must be the same for all values of y. When this is the case, we say the difference exhibits homoscedasticity the residuals should follow the normal distribution with mean of 0 successive values of the dependent variable must be uncorrelated reports the variation in the dependent variable the variation is divided into two components:a. the Explained Variation is that accounted for by the set of independent variableb. the Unexplained or Random Variation is not accounted for by the independent variables The AVOVA TableA correlation matrix is used to show all possible simple correlation coefficients among the variables it shows how strongly each independent variable is correlated with the dependent variable. the matrix is useful for locating correlated independent variables.Correlation MatrixGlobal TestThe global test is used to investigate whether any of the independent variables have significant coefficients. 0 equal s allNot :1bH0...:210bbbHk====The hypotheses are: continuedThe test statistic is the F distribution with k (number of independent variables) and n-(k+1) degrees of freedom, where n is the sample sizeGlobal Test continuedTest for Individual VariablesThis test is used to determine which independent variables have nonzero regression coefficients the variables that have zero regression coefficients are usually dropped from the analysis the test statistic is the t distribution with n-(k+1) degrees of freedom. A market researcher for Super Dollar Super Markets is studying the yearly amount families of four or more spend on food. Three independent variables are thought to be related to yearly food expenditures (Food). Those variables are: total family income (Income) in $00, size of family (Size), and whether the family has children in college (College)and... gender the part is acceptable or unacceptable the voter will or will not vote for the incumbent continuedNote: the following regarding the regression equation the variable college is called a dummy or indicator variable. (It can take only one of two possible outcomes, i.e. a child is a college student or not)Other examples of dummy variables includeWe usually code one value of the dummy variable as “1” and the other “0” continuedFamilyFoodIncomeSizeStudent139003764025300515513430051640449004685056400538616730062671749005435085300437409610060851106400513611174004936112580056350Use a computer software package, such as MINITAB or Excel, to develop a correlation matrix. From the analysis provided by MINITAB, write out the regression equation: continuedWhat food expenditure would you estimate for a family of 4, with no college students, and an income of $50,000 (which is input as 500)?The regression equation isFood = 954 + 1.09 Income + 748 Size + 565 StudentPredictor Coef SE Coef T PConstant 954 1581 0.60 0.563Income 1.092 3.153 0.35 0.738Size 748.4 303.0 2.47 0.039Student 564.5 495.1 1.14 0.287S = 572.7 R-Sq = 80.4% R-Sq(adj) = 73.1%Analysis of VarianceSource DF SS MS F PRegression 3 10762903 3587634 10.94 0.003Residual Error 8 2623764 327970Total 11 13386667 continuedy = 954 +1.09x1 + 748x2 + 565x3The regression equation isFood = 954 + 1.09 Income + 748 Size + 565 StudentPredictor Coef SE Coef T PConstant 954 1581 0.60 0.563Income 1.092 3.153 0.35 0.738Size 748.4 303.0 2.47 0.039Student 564.5 495.1 1.14 0.287S=572.7 R-Sq = 80.4%R-Sq(adj) = 73.1%Analysis of VarianceSource DF SS MS F PRegression 3 10762903 3587634 10.94 0.003Residual Error 8 2623764 327970Total 11 13386667From the regression output we note:The coefficient of determination is 80.4 percent. continuedThis means that more than 80 percent of the variation in the amount spent on food is accounted for by the variables income, family size, and studentThe regression equation isFood = 954 + 1.09 Income + 748 Size + 565 StudentPredictor Coef SE Coef T PConstant 954 1581 0.60 0.563Income 1.092 3.153 0.35 0.738Size 748.4 303.0 2.47 0.039Student 564.5 495.1 1.14 0.287S=572.7 R-Sq = 80.4%R-Sq(adj) = 73.1%Analysis of VarianceSource DF SS MS F PRegression 3 10762903 3587634 10.94 0.003Residual Error 8 2623764 327970Total 11 13386667An additional family member will increase the amount spent per year on food by $748 continuedA family with a college student will spend $565 more per year on food than those without a college student continuedThe correlation matrix is as follows: Food Income SizeIncome 0.587Size 0.876 0.609Student 0.773 0.491 0.743The strongest correlation between the dependent variable (Food) and an independent variable is between family size and amount spent on food.None of the correlations among the independent variables should cause problems. All are between –.70 and .70 Find the estimated food expenditure for a family of 4 with a $500 (that is $50,000) income and no college student. continuedThe regression equation isFood = 954 + 1.09 Income + 748 Size + 565 Studenty = 954 + 1.09(500) + 748(4) + 565(0) = $4,491Decision: H0 is rejected. Not all the regression coefficients are zeroThe regression equation isFood = 954 + 1.09 Income + 748 Size + 565 StudentPredictor Coef SE Coef T PConstant 954 1581 0.60 0.563Income 1.092 3.153 0.35 0.738Size 748.4 303.0 2.47 0.039Student 564.5 495.1 1.14 0.287S=572.7 R-Sq = 80.4%R-Sq(adj) = 73.1%Analysis of VarianceSource DF SS MS F PRegression 3 10762903 3587634 10.94 0.003Residual Error 8 2623764 327970Total 11 13386667 Conduct a global test of hypothesis to determine if any of the regression coefficients are not zeroH0 is rejected if F>4.07from the MINITAB output, the computed value of F is 10.94 continued 0¹ boneleast at :1H===bbb0:3210H Using the 5% level of significance, reject H0 if the P-value<.05 continuedFrom the MINITAB output, the only significant variable is FAMILY (family size) using the P-values (The other variables can be omitted from the model)H120:b¹H020:b= The regression equation isFood = 954 + 1.09 Income + 748 Size + 565 StudentPredictor Coef SECoef T PConstant 954 1581 0.60 0.563Income 1.092 3.153 0.35 0.738Size 748.4 303.0 2.47 0.039Student 564.5 495.1 1.14 0.287Conduct an individual test to determine which coefficients are not zero (This is the hypothesis for the independent variable family size)The regression equation isFood = 954 + 1.09 Income + 748 Size + 565 StudentPredictor Coef SECoef 954 1581 0.60 Income 1.092 3.153 Size 748.4 303.0 Student 564.5 495.1 S=572.7 R-Sq = 80.4% R-Sq(adj) = 73.1% Regression Analysis: Food versus SizeThe regression equation isFood = 340 + 1031 SizePredictor Coef SECoefConstant 339.7 940.7 Size 1031.0 179.4S = 557.7 R-Sq = 76.8% R-Sq(adj) = 74.4% continuedRerun the analysis using only the significant independent family sizeand the R-square term was reduced by only 3.6 percent...the coefficient of determination is 76.8 percent (the two independent variables are dropped)the new regression equation is: y = 340 + 1031 X 2Residuals should be approximately normally distributedAnalysis Residuals of histograms and stem-and-leaf charts are useful in checking this requirementA residual is the difference between the actual value of y and the predicted value y a plot of the residuals and their corresponding y values is used for showing that there are no trends or patterns in the residualsAnalysis Residuals of4500750060000-5005001000 yResidualsPlotAnalysis Residuals of-600 -200 200 600 1000876543210FrequencyResidualsHistogramsTest your learning www.mcgrawhill.ca/college/lindClick onOnline Learning Centrefor quizzesextra contentdata setssearchable glossaryaccess to Statistics Canada’s E-Stat dataand much more!This completes Chapter 14

Các file đính kèm theo tài liệu này:

  • ppt14edited_6035.ppt