Tài liệu Bài giảng Statistical Techniques in Business and Economics - Chapter 13 Linear Regression and Correlation: Chapter 13Linear Regression and CorrelationWhen you have completed this chapter, you will be able to:Chapter Goals Identify a relationship between variables on a scatter diagram Measure and interpret a degree of relationship by a coefficient of correlation Conduct a test of hypothesis about the coefficient of correlation in a populationand...1.2.3.4.1313 - 2 Identify the roles of dependent and independent variables, the concept of regression, and its distinction from the concept of correlation.Chapter Goals Conduct a test of hypothesis for a regression model and each coefficient of regression.6.7.8.13 Conduct analysis of variance and calculate coefficient of determination. Estimate confidence and prediction intervals5.13 - 3 Measure and interpret the strength of relationship between two variables through a regression line and the technique of least squares.Terminologyis a chart that portrays the relationship between the two variables. Scatter Diagram Correlation Analysisis a ...
56 trang |
Chia sẻ: honghanh66 | Lượt xem: 523 | Lượt tải: 0
Bạn đang xem trước 20 trang mẫu tài liệu Bài giảng Statistical Techniques in Business and Economics - Chapter 13 Linear Regression and Correlation, để tải tài liệu gốc về máy bạn click vào nút DOWNLOAD ở trên
Chapter 13Linear Regression and CorrelationWhen you have completed this chapter, you will be able to:Chapter Goals Identify a relationship between variables on a scatter diagram Measure and interpret a degree of relationship by a coefficient of correlation Conduct a test of hypothesis about the coefficient of correlation in a populationand...1.2.3.4.1313 - 2 Identify the roles of dependent and independent variables, the concept of regression, and its distinction from the concept of correlation.Chapter Goals Conduct a test of hypothesis for a regression model and each coefficient of regression.6.7.8.13 Conduct analysis of variance and calculate coefficient of determination. Estimate confidence and prediction intervals5.13 - 3 Measure and interpret the strength of relationship between two variables through a regression line and the technique of least squares.Terminologyis a chart that portrays the relationship between the two variables. Scatter Diagram Correlation Analysisis a group of statistical techniques used to measure the strength of the association between two variables.Dependent Variableis the variable being predicted or estimated.provides the basis for estimation. It is the predictor variable.Independent VariableThe Coefficient of Correlationr Is a measure of strength of the relationship between two variables It requires interval or ratio-scaled data It can range from -1.00 to 1.00 Values of -1.00 or 1.00 indicate perfect and strong correlation Values close to 0.0 indicate weak correlationNegative values indicate an inverse relationship and positive values indicate a direct relationship 0 1 2 3 4 5 6 7 8 9 1010 9 8 7 6 5 4 3 2 1 0 X YPerfect Negative Correlation 0 1 2 3 4 5 6 7 8 9 1010 9 8 7 6 5 4 3 2 1 0 X YPerfect Positive Correlation 0 1 2 3 4 5 6 7 8 9 1010 9 8 7 6 5 4 3 2 1 0 X YZero Correlation 0 1 2 3 4 5 6 7 8 9 1010 9 8 7 6 5 4 3 2 1 0 X YExampleStrong Positive CorrelationChart 13-61313 - 10Chart 13.413How Income and Well-Being of Canadians are Related (1971-97)r = 0.7415Estimate r Formula for Correlation CoefficientsX SYx(x - r =S)(y -)(n – 1)=n xy – ( x)( y)SSSn x 2 – ( x)2SSn y 2 – ( y)2SSy_ represented by r2 is the proportion of the total variation in the dependent variable (Y) that is explained or accounted for by the variation in the independent variable (X). it is the square of the coefficient of correlation it ranges from 0 to 1 it does not give any information on the direction of the relationship between the variablesCoefficient of DeterminationDan Ireland, the student body president, is concerned about the cost to students of textbooks. DataHe believes there is a relationship between the number of pages in the text and the selling price of the book! To provide insight into the problem he selects a sample of eight (8) textbooks currently on sale in the bookstore. Draw a scatter diagram. Compute the correlation coefficient.Correlation CoefficientSolve Book # Pages Price ($)Into to History 500 84Basic Algebra 700 75Intro. to Psych. 800 99Intro. to Sociology 600 72Bus. Mgmt. 400 69Intro to Biology 500 81Fund. of Jazz 600 63Intro. to Nursing 800 93 DataCorrelation CoefficientA40050060070080060709010080Price ($)Pages Scatter Diagram of Number of Pages and Selling Price of Text Scatter Diagramor...A Scatter DiagramExcel PrintoutSolve...Using Formula Book # Pages Price ($)Into to History 500 84Basic Algebra 700 75Intro. to Psych. 800 99Intro. to Sociology 600 72Bus. Mgmt. 400 69Intro to Biology 500 81Fund. of Jazz 600 63Intro. to Nursing 800 93Total 4900 636 Correlation Coefficientx y xy x2 y2 42 000 250 000 7 056 52 500 490 000 5 625 79 200 640 000 9 801 43,200 360 000 5 184 27 600 160 000 4 761 4 050 250 000 6 561 37 800 360 000 3 969 74 400 640 000 8 649397 200 3150 000 51 606))((yxxynr2)(xS-2xnSSS-S=2)(yS-2ynSCorrelation Coefficientx y xy x2 y24 900 636 397 200 3 150 000 51 606SSSSSThe correlation coefficient is 61.4%. This indicates a moderate association between the variables.2)636()606,51(8-2315 000)9004((8-)636)(4 900(-r = 0.614))((yxxynr2)(xS-2xnSSS-S=2)(yS-2ynS)200397(8=H0 is rejected if t>3.143 or if t<-3.143. There are 6 df, found by n – 1 = 8 – 2 = 6. Let’s test the hypothesis that there is no correlation in the population. Use a .02 significance level. H0: r = 0 H1: r 0 = 0.02State the null and alternate hypothesesStep 1Select the level of significanceStep 2Identify the test statisticStep 3State the decision ruleStep 4Compute the test statistic and make a decisionStep 5...Step 5Compute the test statistic and make a decisionStep 5H0 is not rejected. We cannot reject the hypothesis ...that there is no correlation in the population. The amount of association could be due to chance.Conclusion:continuedLet’s test the hypothesis that there is no correlation in the population. Use a .02 significance level. 905.12)614(.128614.--==We use the independent variable (X) to estimate the dependent variable (Y)Regression Analysis both variables must be at least interval scale the relationship between the variables is linear least squares criterion is used to determine the equation i.e. the term (y – y)2 is minimized^Regression Equationwherea is the Y-intercept it is the estimated y value when x = 0the least squares principle is used to obtain a and by = a + bxy is the average predicted value of y for any xb is the slope of the line, or the average change in y for each change of one unit in x aynbxn=-SSbnxyxynxx=--()()()()()SSSSS22Regression Equationy = a + bxDan Ireland, the student body president, is concerned about the cost to students of textbooks. He believes there is a relationship between the number of pages in the text and the selling price of the book! To provide insight into the problem he selects a sample of eight (8) textbooks currently on sale in the bookstore. DataDevelop a regression equation that can be used to estimate the selling price based on the number of pages!x y xy x2 y24 900 636 397 200 3 150 000 51 606SSSSSA8(397 200) – (4 900)(636)8(3 150 000) – (4 900)2== .05143=6368- 0.051434 900 8= 48.0 = 48.0 + 0.05xSuggests each extra page adds $0.05 to the price of a book; the y-intercept suggests that a book with 0 pages would cost $48.bnxyxynxx=--()()()()()SSSSS22aynbxn=-SSy = a + bxcontinuedFind the estimated selling price of an 800 page book.Substituting 800 for x,The estimated selling price of an 800 page book is $89.14y = 48 + 0.05xy = 48 + 0.05(800) = 89.14Using ExcelUsingExcelClick on CHART WIZARDSeeClick on XY (Scatter)UsingExcelINPUT DATA rangeClick NextUsingExcelComplete INPUTTING of TITLESClick NextClick FinishUsingExcelTo “format the axes scales” Right mouse click on one of the axes Complete INPUTTING of VALUESClick OKSeeUsingExcelClick on Format AxisTo remove the Legend on the right side Right mouse click and Click on ClearUsingExcelTo add the Regression Line and equation to this scatter plot Right mouse click on one of the data points... Scroll down to Add Trendline... ClickUsingExcelSee then CLICK on OPTIONS TABUsingExcelClick OKChoose LinearSeeCheck EQUATION and R-squared ValueClick OKUsingExcelSeeYou can now interpret your results!UsingExcelConcerned about the y intercept?Alternate Solution Formatting the axes Resulted in . a distortion of the y-interceptUsingExcelDataforUsingExcelSeeClick on ToolsClick on DATA ANALYSISSeeHighlight REGRESSIONUsingExcelSeeClick OK INPUT NEEDSUsingExcelSeeClick OKUsingExcelSeeThe regression equation is:y = - 0.07x +22.6UsingExcelThe Standard Error of Estimatethis measures the scatter, or dispersion, of the observed values around the line of regressionThe formulas that are used to compute the standard error are:==Se22-S-S-Snxybyay2)(2--Snyy The Standard Error of EstimateFind the standard error of estimate for the problem involving the number of pages in a book and the selling price.10.408=28)200,397(05143.0)636(48606,51---=x y xy x2 y24 900 636 397 200 3 150 000 51 606SSSSSPreviously:=Se22-S-S-SnxybyayAssumptions Underlying Linear Regression For each value of x, there is a group of y values, and these y values are normally distributedThe means of these normal distributions of y values all lie on the straight line of regressionThe standard deviations of these normal distributions are equal The y values are statistically independent. This means that in the selection of a sample the y values chosen for a particular x value do not depend on the y values for any other x valuesConfidence IntervalThe confidence interval for the mean value of y for a given value of x is given by:31.1514.89±8)4 900(0001503)5.612800(8122--+)408.10(447.214.89±Previously:x y xy x2 y24 900 636 397 200 3 150 000 51 606SSSSS)()(1222-S-+±nxney0tα/2(n-2)±Sxx0SxPrediction IntervalThe prediction interval for an individual value of y for a given value of x is given by:)408.10(447.214.89±8)4 900(0001503)5.612800(8122--+1 +72.2914.89±)()(1222-S-+±nxney0tα/2(n-2)±Sxx0SxPreviously:x y xy x2 y24 900 636 397 200 3 150 000 51 606SSSSSThe estimated selling price for a book with 800 pages is $89.14The standard error of estimate is $10.41The 95 percent confidence interval for all books with 800 pages is $89.14 + $15.31 This means the limits are between $73.83 and $104.45The 95 percent prediction interval for a particular book with 800 pages is $89.14 + $29.72 The means the limits are between $59.42 and $118.86Summarizing the ResultsThese results appear in the following MINITAB output.The regression equation isPrice = 48.0 + 0.0514 PagesPredictor Coef SE Coef T PConstant 48.00 16.94 2.83 0.030Pages 0.05143 0.02700 1.90 0.105S = 10.41 R-Sq = 37.7% R-Sq(adj) = 27.3%Analysis of VarianceSource DF SS MS F PRegression 1 393.4 393.4 3.63 0.105Residual Error 6 650.6 108.4Total 7 1044.0Predicted Values for New ObservationsNew Obs Fit SE Fit 95.0% CI 95.0% PI1 89.14 6.26 (73.82,104.46) (59.41,118.88) Regression Analysis: Price versus Pages EXCEL output:Price vs. PagesTest your learning www.mcgrawhill.ca/college/lindClick onOnline Learning Centrefor quizzesextra contentdata setssearchable glossaryaccess to Statistics Canada’s E-Stat dataand much more!This completes Chapter 13
Các file đính kèm theo tài liệu này:
- 13edited_2434.ppt