If you provide a path for the optional Output Report File, a PDF will be created that contains all of the information in the summary report plus additional graphics to help you assess your model. The first page of the report provides information about each explanatory variable. If the Koenker test is statistically significant see number 4 above , you can only trust the robust probabilities to determine if a variable is helping your model. Statistically significant coefficients will have an asterisk next to their p-values for the probabilities and robust probabilities columns.
You can also tell from the information on this page of the report whether any of your explanatory variables are redundant exhibit problematic multicollinearity. Unless theory dictates otherwise, explanatory variables with elevated Variance Inflation Factor VIF values should be removed one by one until the VIF values for all remaining explanatory variables are below 7.
This is section 1 of the Output Report. This page also includes Notes on Interpretation describing why each check is important. If your model fails one of these diagnostics, refer to the table of common regression problems outlining the severity of each problem and suggesting potential remediation.
The graphs on the remaining pages of the report will also help you identify and remedy problems with your model. This is section 2 of the Output Report. The third section of the Output Report File includes histograms showing the distribution of each variable in your model, and scatterplots showing the relationship between the dependent variable and each explanatory variable.
If you are having trouble with model bias indicated by a statistically significant Jarque-Bera p-value , look for skewed distributions among the histograms, and try transforming these variables to see if this eliminates bias and improves model performance.
The scatterplots show you which variables are your best predictors. Use these scatterplots to also check for nonlinear relationships among your variables. In some cases, transforming one or more of the variables will correct nonlinear relationships and eliminate model bias. Outliers in the data can also result in a biased model. Check both the histograms and scatterplots for these data values and data relationships.
Try running the model with and without an outlier to see how much it is impacting your results. You may discover that the outlier is invalid data entered or recorded in error and be able to remove the associated feature from your dataset.
If the outlier reflects valid data and is having a strong impact on the results of your analysis, you may decide to report your results both with and without the outlier.
This is section 3 of the Output Report. When you have a properly specified model, the over- and underpredictions will reflect random noise. If you were to create a histogram of random noise, it would be normally distributed think bell curve. The fourth section of the Output Report File presents a histogram of the model over- and underpredictions. The bars of the histogram show the actual distribution, and the blue line superimposed on top of the histogram shows the shape the histogram would take if your residuals were, in fact, normally distributed.
Perfection is unlikely, so you should check the Jarque-Bera test to determine whether deviation from a normal distribution is statistically significant. This is section 4 of the Output Report. The Koenker diagnostic tells you if the relationships you are modeling either change across the study area nonstationarity or vary in relation to the magnitude of the variable you are trying to predict heteroscedasticity.
Geographically Weighted Regression will resolve issues with nonstationarity; the graph in section 5 of the Output Report File will show you if you have a problem with heteroscedasticity. This scatterplot graph shown below charts the relationship between model residuals and predicted values.
Suppose you are modeling crime rates. If the graph reveals a cone shape with the point on the left and the widest spread on the right of the graph, it indicates your model is predicting well in locations with low rates of crime, but not doing well in locations with high rates of crime.
This is section 5 of the Output Report. The last page of the report records all of the parameter settings that were used when the report was created. Examine the model residuals found in the Output Feature Class. Over- and underpredictions for a properly specified regression model will be randomly distributed. Clustering of over- and underpredictions is evidence that you are missing at least one key explanatory variable. Examine the patterns in your model residuals to see if they provide clues about what those missing variables might be.
Sometimes running Hot Spot Analysis on regression residuals helps you identify broader patterns. Additional strategies for dealing with an improperly specified model are outlined in What they don't tell you about regression analysis.
View the coefficient and diagnostic tables. Creating the coefficient and diagnostic tables is optional. While you are in the process of finding an effective model, you may choose not to create these tables.
The model-building process is iterative, and you will likely try a number of different models different explanatory variables until you settle on a few good ones. The model with the smaller AICc value is the better model that is, taking into account model complexity, the model with the smaller AICc provides a better fit with the observed data. Creating the coefficient and diagnostic tables for your final OLS models captures important elements of the OLS report.
The coefficient table includes the list of explanatory variables used in the model with their coefficients, standardized coefficients, standard errors, and probabilities. The coefficient is an estimate of how much the dependent variable would change given a 1-unit change in the associated explanatory variable. The units for the coefficients matches the explanatory variables. If, for example, you have an explanatory variable for total population, the coefficient units for that variable reflect people; if another explanatory variable is distance meters from the train station, the coefficient units reflect meters.
When the coefficients are converted to standard deviations, they are called standardized coefficients. You can use standardized coefficients to compare the effect diverse explanatory variables have on the dependent variable. Interpretations of coefficients, however, can only be made in light of the standard error. Standard errors indicate how likely you are to get the same coefficients if you could resample your data and recalibrate your model an infinite number of times.
Large standard errors for a coefficient mean the resampling process would result in a wide range of possible coefficient values; small standard errors indicate the coefficient would be fairly consistent. The diagnostic table includes results for each diagnostic test, along with guidelines for how to interpret those results. Start with Regression analysis basics or work through the Regression Analysis tutorial. Apply regression analysis to your own data, referring to the table of common problems and the What they don't tell you about regression analysis topic for additional strategies.
If you are having trouble finding a properly specified model, the Exploratory Regression tool can be helpful. The following are also helpful resources: Mitchell, Andy. ESRI Press, Wooldridge, J. Introductory Econometrics: A Modern Approach. South-Western, Mason, Ohio, Hamilton, Lawrence C. Regression with Graphics. Feedback on this topic?
Back to Top. Interpreting OLS results Output generated from the OLS tool includes an output feature class symbolized using the OLS residuals, statistical results, and diagnostics in the Messages window as well as several optional outputs such as a PDF report file, table of explanatory variable coefficients, and table of regression diagnostics. When these assumptions are not possible to keep, a consequence is that the covariance matrix cannot be estimated using the classical formula, and the variance of the parameters corresponding to the beta coefficients of the linear model can be wrong and their confidence intervals as well.
An automatic selection of the variables is performed if the user selects a too high number of variables compared to the number of observations. The deleting of some of the variables may however not be optimal: in some cases we might not add a variable to the model because it is almost collinear to some other variables or to a block of variables, but it might be that it would be more relevant to remove a variable that is already in the model and to the new variable.
For that reason, and also in order to handle the cases where there a lot of explanatory variables, other methods have been developed such as Partial Least Squares regression PLS. Linear regression is often use to predict outputs' values for new samples. XLSTAT enable you to characterize the quality of the model for prediction before you go ahaed and use it for predictive use.
How to compute influence diagnostics for linear regression models. Password Forgot your password? Variable selection in the OLS regression An automatic selection of the variables is performed if the user selects a too high number of variables compared to the number of observations. Predictions in OLS regression Linear regression is often use to predict outputs' values for new samples. Tutorials for Ordinary Least Squares regression Below you will find a list of examples using ordinary least squares regression: A simple linear regression model A multiple linear regression model How to compute influence diagnostics for linear regression models.
View all tutorials. Download xlstat. Related features Distribution fitting. Linear regression. Logistic regression Binary, Ordinal, Multinomial, …. Ordinal logit model.
0コメント