#1 Survey Software

    |   

Free Trial

    |
|   Contact Us
Online Survey Software - Enterprise Survey Software Solutions
    Home       Product Information     Enterprise Solutions     Mailing Lists     Consulting     Survey University     Client Login  
Online Survey Software - Product Information

   Survey Software      Send and Track     View Results     How To Guides     Research Topics     Tutorials     Outsourcing    


STEPWISE MULTIPLE LINEAR REGRESSION ANALYSIS

REQUIREMENTS : Regression is used to test the effects of n independent (predictor) variables on a single dependent (criterion) variable. Regression tests the deviation about the means, and all variables must be interval scaled. Computationally, regression analysis may be conducted using either a raw data matrix (respondents by variables) or a correlation matrix.

Regression analysis measures the degree of influence of the independent variables on a dependent variable. In the case of a single independent variable, the dependent variable could be predicted from the independent variable by the simple equation:

y = a + bx {where a is constant}

This could be extended to a multi-variable concept as follows:

y = a + b 1 x 1 + b 2 x 2 + b 3 x 3 + ..... +b n x n

It should be noted that whether it be for a single variable or for multiple variables, the relationship predicted is always linear.

A Graphical Explanation of Bi-Variate (2 Variable) Regression Analysis
A simple approach to approximate a regression equation for a single variable is to plot the relationship between the variables. The task requires that we first plot the dependent variable against the independent variable. This type of plotting is called the scatter diagram.
Next, identify the straight line that represents the trend through the mid-point of the data, this line must be the one with the `best fit'. The regression analysis line identifies the trend or relationship between the independent and dependent variables. The relationship, once identified, is used to predict the various values of the dependent variable given specific values of the independent variable. This predicted relationship is always in the form of a linear trend.

The table below identifies a set of values for an independent (X) and dependent (Y) variable.

X   39  43  21  64  57  47  28  75  34  52

Y   68  82  56  86  97  94  77  103 59  79


The scatter plot of the variables is given below:



Regression analysis is utilized to develop an accurate mathematical formulation of the regression analysis. The line of best fit is defined as a line for which minimizes the sum of squares of deviation of the various data points from the line. The regression line is also referred to as the least squares line.

In case of a multi-variable regression, the analysis is a sequence of multiple linear regression equations that are developed in a stepwise manner. At each step of the sequence, one variable is added to the regression equation.

The variable added is the one that makes the greatest reduction in the error sum of squares of the sample data. Equivalently it is the variable that when added, provides the greatest increase in the F value. Variables not having a significant correlation with the dependent variable, are those whose addition does not increase the F value and are not featured in the regression equation.

Mathematical Computation of the Regression Coefficients

I. With one independent Variable: The Mathematical Computation of the Regression Coefficients for the case of a single independent variable is given below:

The slope (regression coefficient) for the line of least squares is given by b, where



The intercept of the line is given by a, where
a = y - bx

The mathematical formula used for this computation is as follows:



The Residual : The residual is defined as the difference between the actual and predicted values of the dependent variable. The standard error of the estimate is the standard deviation of the residuals. The standard error of the estimate can be calculated as follows:


A Numerical Example: One dependent variable

 

 

Let us use the data which produced the above graphical representation of a regression analysis.

SL.No    y     x     xy               

  1      68   39     2652    4624    1521 

  2      82   43     3526    6724    1849 

  3      56   21     1176    3136     441 

  4      86   64     5504    7396    4096 

  5      97   57     5529    9409    3249 

  6      94   47     4418    8836    2209 

  7      77   28     2156    5929     784 

  8     103   75     7725  10609    5625 

  9      59   34     2006    3481    1156 

10      79   52     4108    6241    2704 

SUM    801  460   38800  66385  23634 

AVG.    80.1  46     3888  6638.5 2363.4

                                          











Therefore, the slope is given by:



and the intercept is given by :
a = Y - bX = 80.1 - 0.789814*46 = 43.768553

Hence the line of best fit is given by :

Y = 43.768553 + 0.789814 X

As an alternate method of deriving the regression equation, a spreadsheet could be used. The line for a single variable regression was derived by using the Excel spreadsheet. The output from Excel for the above data set is given below:

Regression Output:
Constant
43.76855
Std Err of Y Estimate
9.230407
R Squared
0.693647
No. of Observations
10
Degrees of Freedom
8
X Coefficient(s)
0.788348
X Coefficient(s)
0.789814


A Numerical Example: Multiple Regression

The Mathematical Computation of the Regression Coefficients for one or more independent variables involves matrix computations. A brief result is given below:
Let X be the data matrix of the predictor (independent) variables. Y is the data vector representing the criterion (dependent) variable and b is the data vector representing the regression coefficients including the constants. The vector of regression coefficients is computed as



       

           Y        X0    X1      X2