Regression is one of the most commonly used methods to build a model or equation between the output and input variables. For example, a company may want to predict the response in demand for a product to changes in price (input) or an agriculturist may want to study the dependence of crop yield on temperature, amount of water, and amount of fertilizer. The objective of regression is to estimate or predict the value of the output for a given value of the input variables. The regression model can also be used to solve the inverse problem, if I desire a specific value of the output, what should be the inputs I should use to achieve this output. The regression model helps us to optimize our inputs in order to achieve a certain level of output.
The regression model is typically developed between continuous variables. However, adequate regression models can also be developed for other types of variables such as ratio, interval, nominal and ordinal variables. Usually, a linear regression model is fit between the inputs and outputs. Even if the model between inputs and output is non-linear in a large range, we could look at a smaller region of inputs where a linear approximation suffices. Such a model can be termed as the classical linear regression model. A simple linear regression model between one input and one output is modeled by the equation shown below:
Where β_1 and β_2 are the model parameters, X_i is the input variable, Y_i is the output variable, and e_i is the error term.
As an example, consider the data collected for car production (X) vs. electricity consumption (Y) shown in the following table. The X values take 3 non-stochastic values (80, 100, and 140). The Y values are random and can take any arbitrary value. For example, when X = 80, the Y values are 55, 68, and 85 (with an average value of 69).
The regression coefficients can be estimated by either the ordinary least squares method or the maximum likelihood estimates. We will not get into how to estimate these parameters in this article. We can use a software such as Sigma Magic to estimate the regression coefficients. Fitting a regression model with Sigma Magic is relatively straightforward. First add the regression analysis template to your excel workbook by clicking on Stats and then Regression analysis, enter the X and Y data on the worksheet and then click on Compute Outputs. If you need to make any changes to the analysis options, click on Analysis Setup. The model output results contain the regression model and some checks on the residuals. In this article, we will discuss some of the key assumptions that are required to be made when we fit a linear regression model.
In order to correctly interpret the regression analysis results, the following assumptions are required to be satisfied.
The regression model is linear in the parameters. Note that the regression model may be non-linear with respect to X and Y variables but not with respect to the parameters. The regression model has been correctly specified – there is no specification bias in the model. This implies that we are not omitting important variables from the model or choosing the wrong functional form for the model equation. Unfortunately, in practice one rarely knows the correct variables to include in the model or the correct functional form.
The error term, e, has a zero-mean value, has the same variance for all observations and is not correlated with the explanatory variable X. The requirement of having the same variance for all observations is called homoscedasticity. These assumptions can be checked by plotting the residuals after the model has been fit to check if there is any pattern observable in the residuals.
Would the classical linear regression model apply for the following models?
The following table shows the value of the independent variable (X) and the error term (e) for three different models. By looking at the residuals, determine if it violates any of the assumptions of the classical linear regression model
The classical linear regression applies to cases a), b) and c) as they are linear in the coefficients. However, they would not apply to case d) as it is non-linear in the coefficients. Note that determining whether the equation has the right functional form cannot be done by looking at the equations. We will need to understand the "physics" of the problem to come up with the appropriate model.
Case 1 violates the requirement that the mean value of the error should be zero. Case 2 satisfies the key assumptions. Case 3 violates the requirement since the error terms are correlated with the X values and that the variances are the same across all observations.
In summary, whenever you use a regression model you should make sure that the assumptions used in building your model are satisfied, otherwise you may be drawing the wrong conclusions.
Follow us on LinkedIn to get the latest posts & updates.