Chapter 6 – The Multiple Regression Model
6.1 Introduction
6.2 The multiple regression model
Under the multiple regression model, the regression equation is expanded to allow for the possibility that there are [latex]k[/latex] independent variables [latex]X_1,X_2,...,X_k[/latex] that affect the level of the dependent variable [latex]Y[/latex]. Thus, a multiple regression model may be specified as:[1]
[latex]\begin{equation} \tag{6.1} E(Y\mid X_{1i},X_{2i},\ldots ,X_{ki})=\beta _o+\beta _1X_{1i}+\beta _2X_{2i}+...+\beta _kX_{ki} \end{equation}[/latex]
The dependent variable in this equation is the conditional expectation of [latex]Y[/latex] given the levels of the independent variables [latex]X_1,X_2,...,X_k.[/latex] This equation states that the “average” value of [latex]Y[/latex] is a linear function of the independent variables. As in the bivariate regression model discussed earlier, the subscript [latex]i[/latex] is used to indicate a particular observation. The units of observation may consist of cross-sectional, time-series, or longitudinal (or panel) data. Once again, we will assume that there are [latex]N[/latex] observations available for the dependent and independent variables.
The coefficient [latex]\beta _{o}[/latex] is, once again, the expected intercept of the regression equation. This parameter is equal to the conditional expectation of the dependent variable when each of the independent variables ([latex]X_{ji}[/latex]) equals zero. The coefficients [latex]\beta _{j}[/latex] ([latex]j=1,...,k[/latex]) are called the slope coefficients for the regression equation. Each of these coefficients provide us with a measure of the expected change that occurs in the dependent variable when there is a one-unit change in the corresponding independent variable, holding the level of all of the other independent variables constant.[2]
Thus, holding constant all of the other independent variables, [latex]\beta _{j}[/latex] can be expressed as:
\begin{equation*}
\beta _{j}=\frac{\Delta Y}{\Delta X_{j}}
\end{equation*}
Anyone who has completed an introductory economics classes should recall that economists make extensive use of the ceteris paribus assumption. This assumption involves an analysis of the relationship between two variables, holding constant the effect of all other variables. The estimated slope coefficients in the multiple regression model[latex](\beta_{1},\ldots ,\beta _{k})[/latex] serve a similar role. Each of these coefficients represents the expected ceteris paribus effect of a one-unit change in a single independent variable on the level of the dependent variable.
Equation 6.1 is called the population regression function since it captures the relationship that exists between the dependent and independent variables in the entire population. The relationship between the observed [latex]Y[/latex] and the [latex]X_{j}[/latex] [latex](j=1,\ldots ,n)[/latex] is given by:
[latex]\begin{equation} \tag{6.2} Y_{i}=\beta _{o}+\beta _{1}X_{1i}+\beta _{2}X_{2i}+...+\beta _{k}X_{ki}+u_{i} \end{equation}[/latex]
A random error term has been introduced in equation 6.2 to account for the effect of:
- unobservable variables,
- randomness in the process generating [latex]Y_{i}[/latex], and
- errors in measuring the dependent variable.
This error term, [latex]u_{i}[/latex], is a measure of the difference between the observed value of [latex]Y[/latex] and the conditional expectation of [latex]Y[/latex] (the value predicted by the regression equation):
\begin{equation*} u_{i}=Y_{i}-E(Y | X_{1i},X_{2i},\ldots ,X_{ki}) \end{equation*}
or:
\begin{equation*} u_{i}=Y_{i}-\beta _{o}-\beta _{1}X_{1i}-\beta _{2}X_{2i}-…-\beta _{k}X_{ki} \end{equation*}
Of course, the coefficients [latex]\beta _o,\beta _1,\ldots ,\beta _k[/latex] are not generally known, a priori, by the econometrician. They must be estimated from observed data. The estimated version of equation 6.2 may be written as:
[latex]\begin{equation} \tag{6.3} Y_i=\hat{\beta}_o+\hat{\beta}_1X_{1i}+\hat{\beta}_2X_{2i}+...+\hat{\beta}_kX_{ki}+\hat{u}_i \end{equation}[/latex]
Each of the estimated parameters [latex]\hat{\beta}_o,\hat{\beta}_1,\ldots ,\hat{\beta}_k[/latex] is a random variable. Different estimates of these parameters would be found when a different set of data is used. (The distribution of these estimated parameters is discussed in Chapter 7.)
Using the estimated intercept and slope parameters, the fitted value of [latex]Y_{i}[/latex] may be expressed as:
[latex]\begin{equation} \tag{6.4} \hat{Y}_{i}=\hat{\beta}_{o}+\hat{\beta}_{1}X_{1i}+\hat{\beta}_{2}X_{2i}+...+\hat{\beta}_{k}X_{ki} \end{equation}[/latex]
This fitted equation can be used to generate predicted values of [latex]Y[/latex] for each combination of the independent variables. Equation 6.4 is called a sample regression function. It serves as the estimated version of the population regression equation given in equation 6.1.
Class Attendance and Student Performance
In recent years, many college instructors have observed a decline in student attendance in economics courses. It has long been argued that students who regularly skip classes will receive lower grades. A study by Romer (1993) provides strong empirical support for this argument.
In this study, Romer uses multiple regression analysis to examine the relationship between student attendance and academic performance in his intermediate macroeconomics courses. One of the equations estimated by Romer is:
[latex]\text{Grade}_i=0.67+1.52\text{Attendance}_i+0.78\text{GPA}_i[/latex]
where:
- Grade[latex]_i[/latex] = Student’s course grade (4.0 scale)
- Attendance[latex]_i[/latex] = fraction of classes attended
- GPA[latex]_i[/latex] = = Student’s prior cumulative GPA
This result suggests that a student who attends all classes will, on average, receive a final grade that is 1.52 points (approximately one and a half letter grades) higher than that received by a comparable student who does not attend any classes. Missing three-fourths of the classes will cause the average grade to fall by approximately one full letter grade.
The estimated residual [latex]\hat{u}_i[/latex] equals the difference between the observed value of the dependent variable and the value predicted by the regression equation. Thus:
\begin{equation*}\hat{u}_i=Y_i-\hat{Y}_i\end{equation*}
Let’s reconsider the hypothetical demand curve discussed in Chapter 1. The demand for good [latex]X[/latex] can be represented as:
[latex]\begin{equation}\tag{6.5} Q_{dt}=\beta _o+\beta _1P_{xt}+\beta _2P_{yt}+\beta _3P_{zt}+\beta _4I_t+u_t \end{equation}[/latex]
where:
- Q[latex]_{dt}[/latex] = quantity of good [latex]X[/latex] demanded at time [latex]t[/latex]
- P[latex]_{xt}[/latex] = price of good [latex]X[/latex] at time [latex]t[/latex]
- P[latex]_{yt}[/latex] = price of good [latex]Y[/latex] at time [latex]t[/latex]
- P[latex]_{zt}[/latex] = price of good [latex]Z[/latex] at time [latex]t[/latex]
- I[latex]_t[/latex] = consumer income at time [latex]t[/latex]
In equation 6.5, each of the slope coefficients provides a measure of the effect of a one-unit marginal change in the corresponding variable, holding all other variables constant. For example, [latex]\beta_1[/latex] is a measure of the change in quantity demanded that occurs when the price of good [latex]X[/latex] changes by one unit, holding the prices of other goods ([latex]Y[/latex] and [latex]Z[/latex]) and consumer income constant. Economic theory predicts that the value of [latex]\beta_1[/latex] will be negative.
In this equation, the coefficient [latex]\beta_2[/latex] is a measure of the effect of a change in the price of good [latex]Y[/latex] on the quantity of good [latex]X[/latex] demanded, holding consumer income and the prices of good [latex]X[/latex] and [latex]Z[/latex] constant. Similarly, [latex]\beta_3[/latex] provides a measure of the effect of a change in the price of good [latex]Z[/latex] on the quantity of good [latex]X[/latex] demanded, ceteris paribus. A positive sign for either [latex]\beta_2[/latex] or [latex]\beta_3[/latex] indicates that the corresponding good is a substitute for good [latex]X[/latex]. A negative sign for either of these coefficients indicates that the corresponding good is a complement to good [latex]X[/latex].
The coefficient [latex]\beta_4[/latex] is a measure of the effect of a change in consumer income on the quantity of good [latex]X[/latex] demanded. If this coefficient is positive, then good [latex]X[/latex] is a normal good; a negative coefficient indicates that [latex]X[/latex] is an inferior good.
Suppose that an econometrician estimates the parameters of the sample regression function as:
[latex]\begin{equation} \tag{6.6} \hat{Q}_{dt}=100-5P_{xt}+1.2P_{yt}-2P_{zt}+0.002I_{t} \end{equation}[/latex]
These estimates suggest that:
- the price of good [latex]X[/latex] and the quantity of good [latex]X[/latex] demanded are
inversely related; - goods [latex]X[/latex] and [latex]Y[/latex] are substitute goods;
- goods [latex]X[/latex] and [latex]Z[/latex] are complementary goods; and
- good [latex]X[/latex] is a normal good.
Marriage rates and job prospects
Are marriage decisions affected by job prospects? An interesting study by Preston and Richards (1975) used 1960 census data for the 100 largest U.S. standard metropolitan statistical areas (SMSAs) to investigate this issue. A multiple regression analysis was used to examine the determinants of the proportion of 22-24 year old women that were married in these SMSAs. One of the equations estimated by Preston and Richards is:
\begin{equation*}Y_{i}=1.31-1.056X_{1i}-0.0650X_{2i}-0.643X_{3i}+0.153X_{4i} \end{equation*}
\begin{equation*}R^{2}=0.329,\bar{R}^{2}=0.301\end{equation*}
where:
- [latex]Y_i[/latex] = Proportion of women aged 22-24 that have ever married in SMSA [latex]i[/latex].
- [latex]X_{1i}[/latex] = estimated proportion of female workers in SMSA [latex]i[/latex].
- [latex]X_{2i}[/latex] = median 1958 earnings for females with earnings in SMSA $i$ (in thousands of dollars).
- [latex]X_{3i}[/latex] = total unemployment rate in SMSA [latex]i[/latex], April 1960.
- [latex]X_{4i}[/latex] =proportion of jobs in SMSA [latex]i[/latex] that are classified as clerical or kindred occupations.
Preston and Richards argue that these results suggest that women are less likely to marry when they face relatively good job prospects.
6.3 Assumptions
There are six assumptions that characterize the classical regression model. As noted in Chapter 4, these assumptions represent a set of ideal conditions that guarantee that the OLS estimators possess desirable properties. Since each of these assumptions is a generalization of the assumptions already discussed in Section 4.4, the focus of the discussion will be on how the assumptions are altered in the more general case of the multiple regression model. In later chapters, we will examine the effect of violations of each of these assumptions. In the examples considered in this chapter, however, it is assumed that each of these conditions is satisfied.
Assumption 6.1 – A linear relationship exists between [latex]Y_{i}[/latex] and [latex]k[/latex] independent variables [latex]X_{1i}, X_{2i}, ..., X_{ki}.[/latex]. This relationship can be expressed as:
\begin{equation*}
Y_{i}=\beta_{o}+\beta_{1}X_{1i}+\beta_{2}X_{2i}+…+\beta_{k}X_{ki}+u_{i}
\end{equation*}
where [latex]u_{i}[/latex] is a random error term.
As you will see in Chapter 8, this linearity assumption is not as stringent as it initially appears. A wide variety of alternative functional forms may be transformed into a linear form by suitably redefining variables. For example, the equation:
\begin{equation*}
Y_{i}=\beta _{o}+\beta _{1}\frac{1}{X_{1i}}+\beta _{2}X_{2i}^{2}+u_{i}
\end{equation*}
can be transformed to into the linear form:
\begin{equation*}
Y_{i}=\beta _{o}+\beta _{1}\tilde{X}_{1i}+\beta _{2}\tilde{X}_{2i}+u_{i}
\end{equation*}
by defining:
\begin{equation*}
\tilde{X}_{1i}=\frac{1}{X_{1i}}
\end{equation*}
and
\begin{equation*}
\tilde{X}_{2i}=X_{2i}^{2}
\end{equation*}
Assumption 6.2 – The mean value of the error term, [latex]u_{i}[/latex], is zero ([latex]E(u_{i})=0[/latex]).
Assumption 6.3 – The error terms are identically distributed with a constant variance equal to [latex]\sigma ^{2}[/latex] for all possible combinations of [latex]X_{1i}, X_{2i}, ..., X_{ki}[/latex].
This assumption requires that the error terms be homoskedastic. If the variance of the error terms changes as one or more of the independent variables changes, the error process is said to be heteroskedastic. When this assumption is violated, the variance of the error terms changes with the levels of one or more of the independent variables.
Assumption 6.4 – The error terms are independent across observations (i.e., [latex]u_i[/latex] is independent of [latex]u_j[/latex]when [latex]i\neq j[/latex]).
Assumption 6.5 – The independent variables [latex]X_{1i},X_{2i},.\ldots ,X_{ki}[/latex] are nonstochastic.
This assumption requires that the independent variables are fixed in repeated samples. This assumption guarantees that the error terms will be independent of each of the independent variables ([latex]X_{ji}[/latex]). When the [latex]X_{ji}[/latex] are independent of the error terms ([latex]u_{i}[/latex]). it can easily be shown that:[3]
\begin{equation*}
E(X_{ji}u_{i})=0
\end{equation*}
This property is used below to formulate OLS estimators for [latex]\beta_{0},\beta _{1},...,\beta _{k}[/latex].
Assumption 6.6 – None of the independent variables can be written as an exact linear combination of the other independent variables.
Each of the previous assumptions is a fairly straightforward generalization of one of the assumptions discussed in Chapter 4. At first glance, this assumption appears to be quite different than the corresponding assumption discussed in Chapter 4. It turn out, however, that this assumption is also a more general statement of one of the assumptions discussed earlier. Because this assumption involves some new concepts, however, we should examine it in somewhat greater detail.
6.3.1 Perfect multicollinearity
If one of the independent variables can be written as an exact linear combination of the other independent variables then perfect multicollinearity is said to occur. Estimates of the intercept and slope parameters cannot be constructed when perfect multicollinearity is present. Let’s examine why this is the case.
Suppose that the dependent variable, [latex]Y[/latex], is a linear function of two independent variables, [latex]X[/latex] and [latex]Z[/latex], and a random error term, [latex]u[/latex]. This relationship may be expressed as:
\begin{equation*}
Y_i=\beta_0+\beta _1X_i+\beta _2Z_i+u_i
\end{equation*}
To examine the effect of perfect multicollinearity, assume that perfect multicollinearity exists between $X$ and $Z$. In particular, assume that:
\begin{equation*}
X_i=cZ_i,
\end{equation*}
where [latex]c[/latex]i s a positive constant. As a result of this relationship, [latex]X[/latex] and [latex]Z[/latex] will always move together. When the level of [latex]X_i[/latex] increases by 20%, [latex]Z_i[/latex] must also increase by the same percentage. A 10% decrease in the level of [latex]X_i[/latex] will always be accompanied by a 10% decrease in the level of [latex]Z_i.[/latex] Since these two variables always move together, it is impossible to determine whether a change in the dependent variable is the result of a change in [latex]X_i[/latex] or [latex]Z_i[/latex] (or both). The slope coefficient [latex]\beta_1[/latex] is a measure of the effect on [latex]Y_i[/latex] of a one-unit change in [latex]X_i[/latex], holding the level of [latex]Z_i[/latex] constant. The coefficient [latex]\beta_2[/latex] provides a measure of the effect of a change in [latex]Z_i[/latex], holding the level of [latex]X_i[/latex] constant. Since the variables [latex]X[/latex] and [latex]Z[/latex] always move together, it is impossible to determine the separate effects of a change in either variable. Thus, the parameters [latex]\beta_1[/latex] and [latex]\beta_2[/latex] cannot be estimated. [4]
In general, whenever any independent variable can be written as an exact linear combination of one or more of the other independent variables, the parameters [latex]\beta_o,\beta_1,...,\beta_k[/latex] cannot be estimated. This suggests that each independent variable must exhibit some variation that is independent of that of the other independent variables. Let’s consider an example that illustrates this problem.
Suppose that an economist wishes to examine the determinants of travel expenditures by married couples. To examine this issue, the following multiple regression equation is specified:[5]
[latex]\begin{equation} \tag{6.7} \text{Travel}_{i}=\beta _{o}+\beta _{1}\text{Inc-H}_{i}+\beta _{2}\text{Inc-W}_{i}+\beta _{3}\text{Inc}_{i}+u_{i} \end{equation}[/latex]
where:
- Travel[latex]_i[/latex] = travel expenditures by household[latex]i[/latex]
- Inc-H[latex]_i[/latex] = Husband’s income in household}[latex]i[/latex]
- Inc-W[latex]_i[/latex] = Wife’s income in household [latex]i[/latex]
- Inc[latex]_i[/latex] = combined household income in household [latex]i[/latex]
If the econometrician tries to estimate this equation, the estimation procedure will issue a warning saying that the estimates cannot be computed because perfect multicollinearity was found (the specific error message varies across econometric software packages). The problem in this case is that all of the variation in the household income variable is completely explained by variation in the husband and wife income variables. The parameter [latex]\beta_3[/latex] is supposed to represent the effect of a one-unit change in the Inc[latex]_i[/latex] variable, holding all other variables constant. Since there is no independent variation in this variable, however, the parameters of equation 6.7 cannot be estimated.
Assumption 6.6 essentially requires that each variable exhibit some variation that is independent of all of the other independent variables. In the bivariate regression model discussed in earlier chapters, this assumption simply requires that there be some variation in the independent variable (since there are no other independent variables in this model). Thus, assumption 4.6 is a special case of the more general condition required in assumption 6.6.
It should be noted, however, that assumption 6.6 does require that there be some variation in each of the independent variables. To see this, note that the regression equation can be restated as:
\begin{equation*}
Y_i=\beta _oX_{oi}+\beta _1X_{1i}+\ldots +\beta _kX_{ki}+u_i
\end{equation*}
where [latex]X_{oi}[/latex] is a variable equal to one for each observation. If any of the variables [latex]X_{1i}, X_{2i},\ldots, X_{ki}[/latex] is equal to a constant, then it would be equal to a linear multiple of the constant term. Thus, it would not be possible to estimate the effect of a variable that contains no variation.
In some cases, econometricians will find that a variable that they believe is important does not vary within their sample population. Suppose, for example, that an econometrician wishes to estimate the parameters of an equation explaining a country’s net exports. The level of net exports is affected by variables such as the exchange rate, domestic and foreign price levels, and domestic and foreign income. Some countries, however, operate under fixed exchange rate regimes. During time periods in which exchange rates have not been altered by the government, the official exchange rate will remain constant. In this case, the effect of a change in the exchange rate cannot be estimated since the exchange rate has not varied.
Under Regulation Q, the Federal Reserve Board established the maximum interest rate that could be paid on savings deposits. When Regulation Q was in effect, econometric models could not be used to estimate the effect on consumption spending of changes in the interest rate on savings deposits (since this interest rate was fixed for long periods of time).
6.3.2 Near-perfect multicollinearity
In practice, however, well-specified econometric models do not generally exhibit perfect multicollinearity. A more common problem is that of near-perfect multicollinearity (often simply referred to as a [latex]multicollinearity[/latex] problem). Near-perfect multicollinearity occurs when one of the independent variables is approximately equal to a linear combination of the other independent variables. When a multicollinearity problem is present, parameter estimates may be estimated, but they are less reliable then they would be in the absence of multicollinearity.
Recall that each slope coefficient is supposed to capture the effect on the dependent variable of a one-unit change in the level of the corresponding independent variable, holding all of the other variables constant. When multicollinearity is present, there is little independent variation in two or more of the independent variables. If there is little independent variation in some of these variables, estimates of the separate effect of a change in the corresponding variables become less accurate.
As noted by Goldberger (1991), the problem of multicollinearity is perfectly analogous to the situation that exists when one is trying to estimate a sample mean for a single variable with a very small sample size.[6] In both situations, precise estimates are not possible when only limited information is available.
Since many time-series variables exhibit similar cyclical and trend behavior, multicollinearity problems are often present when time-series data is analyzed. Multicollinearity problems are less common, but may occur, in cross-sectional and panel data models. The problem of multicollinearity will be discussed in more detail in Chapter 11.
It is important to note, however, that multicollinearity involves a linear relationship among some of the independent variables. A strong linear relationship between the dependent and the independent variables does not represent a multicollinearity problem. (In fact, a strong linear relationship indicates that the regression model explains the dependent variable quite well.)
6.4 OLS estimation
As noted in Chapter 4, ordinary least squares (OLS) estimators are derived by finding the values of the estimated intercept and slope parameters that minimize the sum of squared error terms. In mathematical terms, this is equivalent to finding the values of [latex]\hat{\beta_o},\hat{\beta}_1,\ldots ,\hat{\beta}_k[/latex] that solve the following problem:[7]
\begin{equation*}
\underset{\hat{\beta}_{o},\hat{\beta}_{1},\ldots ,\hat{\beta}_{k}}{\text{minimize:}}\sum \hat{u}_{i}^{2}=\sum (Y_{i}-\hat{\beta}_{o}-\hat{\beta}_{1}X_{1i}-\cdots\hat{\beta}_{k}X_{ki})^{2}
\end{equation*}
A general solution to this problem, however, requires the use of matrix algebra and multivariate calculus. Fortunately, however, a wide variety of computer software packages exist that can quickly derive these OLS estimators. In fact, even most popular computer spreadsheet packages contain procedures for OLS regressions.
The intercept and slope parameters can also be estimated using a generalization of the method that we used in Chapter 4 to derive estimated regression parameters in the bivariate regression model. Under this procedure, the assumptions of the classical regression model are used to generate conditions that can be imposed upon our sample estimators.[8]
Community College Enrollments and the Business Cycle
What happens to community college enrollments during a recession? Economic theory does not provide a clear prediction. During a cyclical downturn, declining household income makes it more difficult to afford additional education. Rising unemployment rates, however, lower the opportunity cost of time spent in college.
Betts and McFarland (1995) use multiple regression analysis to examine the determinants of community college enrollments. After controlling for a variety of other factors, they find that enrollment in community colleges increases when the regional unemployment rate rises. As Betts and McFarland note, however, state aid to community colleges often declines during recessions (due to a decline in state tax revenue). Thus, community colleges must provide services to more students while experiencing a declining resource base during recessions.
6.4.1 Derivation of estimated intercept and slope parameters
Assumption 6.2 requires that:
\begin{equation*}
E(u_i)=0
\end{equation*}
while assumption 4.5 guarantees that:
\begin{equation*}
E(X_{1i}u_i)=0
\end{equation*}
\begin{equation*}
E(X_{2i}u_i)=0
\end{equation*}
\begin{equation*}
\vdots
\end{equation*}
\begin{equation*}
E(X_{ki}u_i)=0
\end{equation*}
If these conditions occur in the entire population, it seems reasonable to impose these conditions on the equivalent sample estimators. In other words, we will require that:
\begin{equation} \tag{6.8}
\frac 1N\sum_{i=1}^N\hat{u}_i=0
\end{equation}
\begin{equation*}
\frac 1N\sum_{i=1}^NX_{1i}\hat{u}_i=0
\end{equation*}
\begin{equation*}
\frac 1N\sum_{i=1}^NX_{2i}\hat{u}_i=0
\end{equation*}
\begin{equation*}
\vdots
\end{equation*}
\begin{equation*}
\frac 1N\sum_{i=1}^NX_{ki}\hat{u}_i=0
\end{equation*}
The [latex]k+1[/latex] equations appearing in are called normal equations. These equations may be expressed in terms of the unknown population parameters by using the definition of the sample error term:
\begin{equation*}
\hat{u}_{i}=Y_{i}-\hat{Y}_{i}
\end{equation*}
\begin{equation} \tag{6.9}
=Y_{i}-\hat{\beta}_{o}-\hat{\beta}_{1}X_{1i}-\hat{\beta}_{2}X_{2i}-\cdots -%
\hat{\beta}_{k}X_{ki}
\end{equation}
By substituting the value of [latex]\hat{u}_i[/latex] from equation 6.9 into each of the normal equations in the set of equations 6.8:
\begin{equation*}
\frac{1}{N}\sum_{i=1}^{N}\left( Y_{i}-\hat{\beta}_{o}-\hat{\beta}_{1}X_{1i}-%
\hat{\beta}_{2}X_{2i}-\cdots -\hat{\beta}_{k}X_{ki}\right) =0
\end{equation*}
\begin{equation*}
\frac{1}{N}\sum_{i=1}^{N}X_{1i}\left( Y_{i}-\hat{\beta}_{o}-\hat{\beta}%
_{1}X_{1i}-\hat{\beta}_{2}X_{2i}-\cdots -\hat{\beta}_{k}X_{ki}\right) =0
\end{equation*}
\begin{equation*}
\frac{1}{N}\sum_{i=1}^{N}X_{2i}\left( Y_{i}-\hat{\beta}_{o}-\hat{\beta}%
_{1}X_{1i}-\hat{\beta}_{2}X_{2i}-\cdots -\hat{\beta}_{k}X_{ki}\right) =0
\end{equation*}
\begin{equation*}
\vdots
\end{equation*}
\begin{equation*}
\frac{1}{N}\sum_{i=1}^{N}X_{ki}\left( Y_{i}-\hat{\beta}_{o}-\hat{\beta}%
_{1}X_{1i}-\hat{\beta}_{2}X_{2i}-\cdots -\hat{\beta}_{k}X_{ki}\right) =0
\end{equation*}
With a little bit of manipulation, these equations can be restated as:
\begin{equation} \tag{6.10}
\overline{Y}-\hat{\beta}_{o}-\hat{\beta}_{1}\overline{X}_{1}-\hat{\beta}_{2}%
\overline{X}_{2}-\cdots -\hat{\beta}_{k}\overline{X}_{k}=0
\end{equation}
\begin{equation*}
\sum_{i=1}^{N}X_{1i}Y_{i}-\hat{\beta}_{o}\sum_{i=1}^{N}X_{1i}-\hat{\beta}%
_{1}\sum_{i=1}^{N}X_{1i}^{2}-\hat{\beta}_{2}\sum_{i=1}^{N}X_{1i}X_{2i}-%
\cdots -\hat{\beta}_{k}\sum_{i=1}^{N}X_{1i}X_{ki}=0
\end{equation*}
\begin{equation*}
\sum_{i=1}^{N}X_{2i}Y_{i}-\hat{\beta}_{o}\sum_{i=1}^{N}X_{2i}-\hat{\beta}%
_{1}\sum_{i=1}^{N}X_{2i}X_{1i}-\hat{\beta}_{2}\sum_{i=1}^{N}X_{2i}^{2}-%
\cdots -\hat{\beta}_{k}\sum_{i=1}^{N}X_{2i}X_{ki}=0
\end{equation*}
\begin{equation*}
\vdots
\end{equation*}
\begin{equation*}
\sum_{i=1}^{N}X_{ki}Y_{i}-\hat{\beta}_{o}\sum_{i=1}^{N}X_{ki}-\hat{\beta}%
_{1}\sum_{i=1}^{N}X_{ki}X_{1i}-\hat{\beta}_{2}\sum_{i=1}^{N}X_{ki}X_{2i}-%
\cdots -\hat{\beta}_{k}\sum_{i=1}^{N}X_{ki}^{2}=0
\end{equation*}
In the equations appearing in system 6.10, each of the summations involves the sum of observed quantities (or products of known quantities). Thus, each summation can be treated as a known constant that characterizes the data. Thus, this equation system constitutes a set of [latex]k+1[/latex] linear equations in the unknown parameters [latex]\hat{\beta}_{o},\hat{\beta}_1,\hat{\beta}_2,\ldots,\hat{\beta}_k[/latex]. Under the assumptions of the classical regression model, this equation system may be solved for these unknown parameters. As in the special case of the bivariate regression model, the solutions derived in this manner are identical to the estimators that minimize the sum of squared error terms. (A proof of this proposition is contained in the mathematical appendix at the end of this chapter.)
6.4.2 Example I: Determinants of student success on a microeconomics final exam
Numerous studies have indicated that there is a strong relationship between a student’s mathematical and verbal ability and his or her performance in economics courses. It is also likely that a student’s performance in high school will be useful in predicting his or her success in college courses. These hypotheses can be examined by estimating the parameters of a multiple regression equation such as the one appearing in equation 6.11:
\begin{equation} \tag{6.11}
\text{Final}_i=\beta _o+\beta _1\text{SAT-V}_i+\beta _2\text{SAT-M}_i+\beta_3\text{HSGPA}_i+u_i
\end{equation}
where:
- Final[latex]_i[/latex] = score received by student [latex]i[/latex] on final exam in an introductory microeconomics course (80 possible points)
- SAT-V[latex]_i[/latex] = SAT verbal score for student [latex]i[/latex]
- SAT-M[latex]_i[/latex] = SAT math score for student [latex]i[/latex]
- HSGPA[latex]_i[/latex] = cumulative high school grade point average for student [latex]i[/latex] (expressed as a percentage)
A survey of students was conducted to gather information for a study of the determinants of student performance in an introductory microeconomics course at SUNY-Oswego. In this course, the final exam consisted of 80 multiple choice questions. A sample consisting of 99 students was used to estimate the parameters of equation 6.11.[9]
\begin{equation} \tag{6.12}
\widehat{\text{Final}}_i=-35.366+0.029\text{SAT-V}_i+0.058\text{SAT-M}_{i}+0.372\text{HSGPA}_{i}
\end{equation}
Since the estimated coefficients associated with the variables SAT-V, SAT-M and HSGPA are all positive, this equation suggests that an increase in any one of these variables is associated with an increase in the expected value of the final exam grade.
The estimated slope coefficient on the SAT-V variable indicates that a one-point increase in verbal SAT scores will result in an expected 0.029 increase in the final exam score. This suggests that a 100-point increase in a student’s SAT verbal score will be expected to increase his or her score on this final exam by 2.9 points. Using a similar argument, a 100-point increase in a student’s SAT math score will be expected to result in a 5.8 point increase in his or her final exam score. This result may indicate that math ability is more important than verbal ability in predicting a student’s performance in microeconomics. Remember, however, that these estimators are themselves random variables. In other samples, SAT math scores may appear to be less important than SAT verbal scores. It is possible, however, to construct a formal statistical test to determine whether the coefficient on math scores is significantly greater than the coefficient on SAT verbal scores. Hypothesis tests of this sort will be discussed in Chapter 7.
6.4.3 Example II: A consumption function
In Chapter 4, a simple Keynesian consumption function was specified. In this model, consumption expenditures were assumed to be a linear function of disposable personal income. Economists generally argue, however, that consumption expenditures are also affected by the level of real household wealth and the real market interest rate. In particular, it is assumed that consumption spending is directly related to the level of household wealth and inversely related to the interest rate.
This discussion suggests that the consumption function may be specified as:
\begin{equation} \tag{6.13}
\text{C}_{t}=\beta _{o}+\beta _{1}\text{YD}_{t}+\beta _{2}\text{Wealth}_{t}+\beta _{3}\text{Interest}_{t}+u_{t}
\end{equation}
where:
- C[latex]_t[/latex] = real consumption expenditures in year [latex]t[/latex]
- YD[latex]_t[/latex] = real disposable personal income in year [latex]t[/latex]
- Wealth[latex]_t[/latex] = real value of private wealth in year [latex]t[/latex]
- Interest[latex]_t[/latex] = real yield on 3-month Treasury securities in year [latex]t[/latex]
- [latex]u_t[/latex] = random error term in year [latex]t[/latex]
Each of these variables (other than the interest rate) is measured in billions of 1987 dollars. The real interest rate is measured as the nominal yield on 3-month Treasury bills minus the inflation rate.[10]
In equation 6.13, [latex]\beta_1[/latex] serves as a measure of the additional consumption expenditures resulting from a one-unit increase in disposable income (holding real wealth and the real interest rate constant). Thus, [latex]\beta_1[/latex] equals the marginal propensity to consume (MPC). The slope coefficient [latex]\beta_2[/latex]is a measure of the additional consumption that results from a one-unit increase in real wealth, ceteris paribus. [latex]\beta_3[/latex] is a measure of the change in consumption spending resulting from a one percentage-point increase in the real interest rate.
When the parameters of equation 6.13 are estimated using U.S. time series data for the period 1946-1993, the estimated version of this equation is:[11]
\begin{equation} \tag{6.14}
\text{\^{C}}_{t}=-14.476+0.7468\text{YD}_{t}+0.03267\text{Wealth}_{t}-5.40\text{Interest}_{t}
\end{equation}
Note that each of the estimated parameters has the sign predicted by economic theory. In particular, these results indicate that:
- a one-dollar increase in real disposable personal income results in approximately 75 cents of additional consumption spending;
- a one-dollar increase in real wealth causes consumption spending to increase by approximately 3.3 cents; and
- a one percentage-point increase in the real interest rate results in a reduction in the level of consumption spending by $5.40 billion.
6.4.4 Variances and standard errors of OLS estimators
While the estimation of slope and intercept parameters is extremely important, these estimates are not very useful if we have no measure of their reliability. As noted above, the OLS estimators are random variables. To determine how reliable our point estimates are, we need to have an estimate of the variance of these estimators. The equation for the variance of each estimator is given by:
\begin{equation*}
var(\hat{\beta}_{j})=\sigma _{\hat{\beta}_{j}}^{2}=E\left[ \hat{\beta}%
_{j}-E(\beta _{j})\right] ^{2}
\end{equation*}
While it is possible to compute the formulas for these variances using algebraic techniques, these formulas become quite complex when three or more slope parameters are to be estimated. The general formula for the variances (and covariances) of the OLS estimators can be computed only through the use of matrix algebra. Thus, a more complete discussion of the variances and covariances of the OLS estimators is reserved for more advanced texts that utilize matrix algebra.[12] It should be noted, however, that the variances of the estimated parameters will tend to be smaller when:
- the variance of the error term ([latex]\sigma^2[/latex]) is relatively small,
- the number of observations is relatively large, and
- there is a relatively large amount of independent variation in each of the variables included on the right-hand side of the regression equation.
If the variance of the error terms is relatively small, then the regression equation fits the observed data relatively well. In this case, more precise estimates of the intercept and slope parameters become possible. As in the bivariate regression model discussed in earlier chapters, an increase in the sample size lowers the variance of the estimated intercept and slope parameters. To get relatively precise estimates of each of the intercept and slope parameters, however, it is also important that there be a substantial amount of independent variation in each of the independent variables included in the regression equation. It was noted above that intercept and slope parameters cannot be estimated when one of the independent variables is an exact linear combination of the other independent variables. As will be discussed in much more detail in Chapter 11, one cannot get very precise estimates of the effects of each separate variable if an approximate linear relationship exists among the independent variables in a regression equation.
If you were to compute the formula for the variances of the intercept and slope parameters you would see that each involves the variance of the error term [latex]u_i[/latex]. Since this variance ([latex]\sigma^2[/latex]) is generally unknown, estimated variances for the slope and intercept parameters are derived by replacing the unknown parameter [latex]\sigma^2[/latex] with its estimated value:
\begin{equation} \tag{6.15}
\hat{\sigma}^2=\frac{\sum \hat{u}_i^2}{N-(k+1)}
\end{equation}
The denominator in this expression is equal to the degrees of freedom for this estimator. As noted in Chapter 3, the degrees of freedom for any estimator is equal to the number of observations ([latex]N[/latex]) minus the number of parameters that must be estimated to construct the estimator. In this case, the estimation of [latex]\hat{u}_i^2[/latex] requires the estimation of the [latex]k+1[/latex] intercept and slope parameters, since:
\begin{equation*}
\hat{u}_i=Y_i-\hat{\beta}_o-\hat{\beta}_1X_{1i}-\hat{\beta}_2X_{2i}-\ldots -\hat{\beta}_kX_{ki}
\end{equation*}
Thus, the degrees of freedom for this estimator is [latex]N-(k+1[/latex]).
Since the estimated variances for the OLS coefficients are formed by replacing the actual value of [latex]\sigma^2[/latex] with the estimated value (as specified in equation 6.15) we denote these variances as:
\begin{equation*}
\widehat{var}(\hat{\beta}_{j})
\end{equation*}
The standard error of each OLS parameter is simply the positive square root of the corresponding variance:
\begin{equation*}
s.e.(\hat{\beta}_{j})=\hat{\sigma}_{\hat{\beta}_{j}}=\sqrt{\widehat{var}(\hat{\beta}_{j})}
\end{equation*}
Virtually all computer programs that estimate OLS intercept and slope parameters provide estimated values for the variances and standard errors for these estimates.
As you will see in Chapter 7, these estimated standard errors are used in:
- hypothesis testing, and
- the construction of confidence intervals for the estimated intercept and slope parameters.
6.5 Properties of estimators
Under the conditions of the classical regression model, the OLS estimators are:
- consistent,
- linear,
- unbiased, and
- best linear unbiased estimators (BLUE).
If we also assume that the error terms are normally distributed, then the O.L.S. estimators are fully efficient. Let’s examine each of these properties.[13]
6.5.1 Consistency
The consistency property requires that the OLS intercept and slope estimators converge to the corresponding population parameters as the size of the sample tends towards infinity.[14] As noted in Chapter 3, the consistency property is called a “large-sample property” since it indicates that the estimators tend to become more precise as the size of the sample increases. Note that this property does
not, however, provide any information about the variance of the estimator in a small sample.
6.5.2 Linearity
The linearity property indicates that each estimated slope or intercept coefficient can be written as a linear function of the random variables [latex]Y_1,Y_ 2,\ldots,Y_N.[/latex] As noted earlier, one advantage of a linear estimator is that computational costs are generally lower for linear than for nonlinear estimators. Another advantage of linear estimators is that a large body of mathematical tools developed for the analysis of linear equation systems can be applied to these estimators.
6.5.3 Unbiasedness
Since the OLS estimators are unbiased:
\begin{equation*}
E(\hat{\beta}_{j})=\beta _{j}
\end{equation*}
for each of the estimators [latex]\hat{\beta}_{o},\hat{\beta}_{1},\ldots ,\hat{\beta}_{k}[/latex]. This condition requires that the “average” estimate provided by each of the OLS estimators equals the corresponding population parameter. If the same equation was estimated an infinite number of times using an infinite number of random samples, then the average of all of these estimates would equal the actual population parameter. Roughly speaking, the unbiasedness property means that OLS estimators will not systematically tend to either underestimate or overestimate population parameters.
6.5.4 Best linear unbiased estimators
The Gauss-Markov theorem indicates that under the conditions of the classical regression model, the OLS estimators are best linear unbiased estimators (BLUE). This means that no other linear unbiased estimator of the population parameters will have a sample variance that is less than that of the OLS estimators.[15] This is an extremely important result: it guarantees that no other linear unbiased estimator will be able to perform better than the OLS estimators.
6.5.5 Efficiency
If the error terms are normally distributed, the OLS estimators are efficient. This property indicates that no other unbiased estimator, linear or nonlinear, can have a sampling variance that is less than that of the OLS estimators. Thus, under the normality assumption, the least-squares estimators are minimum variance unbiased estimators. In other words, when the error terms are normally distributed, no other unbiased estimator will perform better than the least-squares estimator.[16]
6.6 The multiple coefficient of determination — R[latex]^2[/latex]
As in the bivariate regression model, it is desirable to have a measure of the overall “fit” of the regression model. This is provided by the multiple coefficient of determination, also known as R[latex]^2[/latex]. This is computed in essentially the same manner as the coefficient of determination discussed in Chapter 4. To define R[latex]^2[/latex] it will be helpful if we recall equation 4.41:
\begin{equation} \tag{6.16}
\text{TSS = RSS + ESS}
\end{equation}
where:
TSS = total sum of squares (= [latex]\sum (Y_i-\overline{Y})^2[/latex])
RSS = regression sum of squares (=[latex]\sum (\hat Y_i-\overline{Y})^2[/latex])
ESS = error sum of squares (=[latex]\sum \hat u_i^2[/latex])
Using these definitions, the multiple coefficient of determination is defined as:
\begin{equation*}
\text{R}^2=\frac{\text{RSS}}{\text{TSS}}
\end{equation*}
Using equation 6.16, this may be restated as:
\begin{equation} \tag{6.17}
\text{R}^2=1-\frac{\text{ESS}}{\text{TSS}}
\end{equation}
As noted in Chapter4, the definition of R[latex]^2[/latex] guarantees that:
\begin{equation*}
0\leq \text{R}^2\leq 1.
\end{equation*}
Once again, R[latex]^2[/latex] serves as a measure of the proportion of the total variation in the dependent variable that can be explained by the variation in all of the independent variables [latex]X_1,X_2,\ldots X_k[/latex]. R[latex]^2[/latex] equals zero if the explanatory variables do not contribute any explanatory power (RSS = 0); R[latex]^2[/latex] equals one only if the fitted equation fits the data exactly (RSS=TSS and ESS = 0).
The above discussion is based on the assumption that an intercept term is included in the regression. If there is no intercept term, R[latex]^2[/latex] no longer serves as a measure of the proportion of the variation explained by the regression. In practice, however, an intercept term should always be included in a regression unless there is a strong a priori reason to believe that the true value of the intercept is zero.
6.7 Adjusted R[latex]^2[/latex]
When additional variables are added to a regression model, R[latex]^2[/latex] will virtually always increase. This may be because a more elaborate model results in an improved ability to explain the dependent variable. If the additional variables are not causally related to the dependent variable, however, R[latex]^2[/latex] will also generally increase. When additional variables are added to a regression model, the increase in R[latex]^2[/latex] is also partially the result of a reduction in the degrees of freedom for the estimators. Let’s examine this effect.
If you were to estimate a bivariate regression model using only two data points, you would always be able to fit a line that exactly passes through the two data points. The R[latex]^2[/latex] for this equation will always equal one (unless the fitted line has a zero slope). If there are three observations, but only two estimated parameters, the regression equation can always pass through two of the data points. In this case, there is only one degree of freedom. If a third parameter is introduced, the regression equation will pass through each of the three data points.
More generally, as the number of estimated parameters increases, R[latex]^{2}[/latex] will virtually always rise (and will never decline). In the extreme case, R[latex]^{2}[/latex] equals one when the number of estimated parameters equals the number of observations. The portion of the increase in R[latex]^{2}[/latex] that is due to this reduction in the degrees of freedom does not represent an improvement in the explanatory power of the regression. For this reason, an alternative measure of R[latex]^{2}[/latex] exists that takes the degrees of freedom into account. This measure, called the adjusted R[latex]^{2}[/latex] (often expressed as [latex]\overline{\text{R}}^{2}[/latex]), is defined as:[17]
\begin{equation} \tag{6.18}
\overline{\text{R}}^{2}=1-\frac{\text{ESS }/\left[ N-(k+1)\right] }{\text{TSS / }\left( N-1\right) }
\end{equation}
With a little bit of algebraic manipulation, it can be shown than that the relationship between [latex]\overline{\text{R}}^{2}[/latex] and R[latex]^2[/latex] is given by:
\begin{equation} \tag{6.19}
\overline{\text{R}}^{2}=1-\frac{N-1}{N-\left( k+1\right) }(1-\text{R}^{2})
\end{equation}
A comparison of equations 6.17 and 6.18 indicates that [latex]\overline{\text{R}}^{2}[/latex] will be less than R[latex]^2[/latex] when [latex]k[/latex] is greater than 0.[18] As the estimated number of slope parameters ([latex]k[/latex]) increases, the value of [latex]\overline{\text{R}}^{2}[/latex] declines. This adjustment serves to correct for the loss of degrees of freedom as additional variables are added to a regression model. For this reason, econometricians generally argue that [latex]\overline{\text{R}}^{2}[/latex] provides a better comparison of the goodness of fit of alternative regression models. Virtually all regression packages will provide measures of both R[latex]^{2}[/latex] and [latex]\overline{\text{R}}^{2}[/latex].
Once again, however, it should be noted that neither R[latex]^2[/latex] nor [latex]\overline{\text{R}}^2[/latex] serves as a test statistic that can be used for hypothesis testing. As noted in section 4.11.5, a high R[latex]^2[/latex] does not necessarily imply that an appropriate model has been specified; nor does a low R[latex]^2[/latex] necessarily imply that a model is poorly specified. Alternative methods of comparing the fit of different regression models will be
discussed in Chapter 10.
6.7. 1 Example III: 1992 election model
In Chapter 5 (section 5.3,3), a simple model of voting behavior in the 1992 U.S. Presidential election was specified. In this model, the proportion of votes for the Democratic candidate was assumed to be a function of the unemployment rate. It is likely, however, that other factors influence voting decisions. Democrats are often perceived as being in favor of a larger role for government in society. Republicans, on the other hand, tend to advocate a reduction in the size of the federal government. Thus, it might be argued that residents in those states that receive more benefits from the federal government might favor the Democratic candidate. To account for these possibilities, an expanded voting model can be specified that includes federal expenditures as an additional independent variable. When this expanded model is estimated using data from the 50 states and the District of Columbia,[19] the results are:
\begin{equation*}
\widehat{\text{DVOTE}}_{i}=25.094+1.464\text{un}_{i}+0.00135\text{fedfunds}_{i}
\end{equation*}
where:
DVOTE[latex]_i[/latex] = proportion of voters selecting the Democratic candidate in state [latex]i[/latex]
un[latex]_i[/latex] = unemployment rate in state [latex]i[/latex]
fedfunds[latex]_i[/latex] = per capita federal government spending in state [latex]i[/latex]
The R[latex]^2[/latex] for this model is 0.573, and the adjusted R[latex]^2[/latex] is 0.555. Both of these statistics indicate that these two independent variables jointly account for a large proportion of the variation in the dependent variable. The estimated values of [latex]\beta_1[/latex] and [latex]\beta_2[/latex] both exhibit the anticipated signs.
Suppose, however, that an economic analyst believes that this model should be further expanded to account for the possibility that voting behavior may also be affected by differences in defense spending and crime rates across states. Let’s define two new variables:
defense[latex]_i[/latex] = per capita defense spending in state [latex]i[/latex]
crime[latex]_i[/latex] = total crime rate in state [latex]i[/latex]
The estimated equation becomes:
\begin{equation*}
\widehat{\text{DVOTE}}_{i}=24.272+1.497\text{un}_{i}+0.00145\text{fedfunds}_{i}
\end{equation*}
\begin{equation*}
-1.69\text{defense}_{i}\text{ + 0.00015crime}_{i}
\end{equation*}
When these two variables are included in the equation, the R[latex]^2[/latex] increases to 0.576 while the adjusted R[latex]^2[/latex] declines to 0.539. As noted above, adding other variables to an equation always tends to increase R[latex]^2[/latex]. The adjusted R[latex]^2[/latex] will increase, however, only if the additional explanatory power contributed by these variables is large enough to compensate for the reduction in the degrees of freedom that occurs when additional independent variables are added to the regression equation. In this case, it appears that adding these additional variables does not result in a substantial improvement in the explanatory power of this regression.[20]
The Econometrics of Presidential Elections
Economists and political scientists have long observed a relationship between the state of the economy and the outcome of elections. Ray Fair (1978) provides one of the first econometric studies of the effect of the economic environment on the outcome of U.S. Presidential elections.
Fair used data from 1889-1976 to examine the effect of economic factors on voting outcomes using regression analysis. His results suggest that changes in real economic activity (as measured by either the growth rate of real per capita GNP or by the change in the unemployment rate) are an important determinant of U.S. Presidential election outcomes.
6.8 Forecasting
Multiple regression analysis is often used for forecasting purposes. In many industries, firms hire econometricians to estimate the demand for their products. Suppose, for example, that a utility company producing electricity wishes to estimate the long-run demand function for its product. One of the staff econometricians estimates this demand function as:
\begin{equation} \tag{6.20}
\text{\^{Q}}_{dt}=1010-0.025\text{P}_{t}^{elec}+0.014\text{P}_{t}^{ng}+0.005%
\text{P}_{t}^{coal}
\end{equation}
\begin{equation*}
+0.012\text{P}_{t}^{oil}+0.0002\text{Income}_{t}
\end{equation*}
where:
- Q[latex]_{dt}[/latex] = (predicted) quantity of electricity demanded at time [latex]t[/latex]
- P[latex]_t^{elec}[/latex] = price of electricity at time [latex]t[/latex]
- P[latex]_t^{ng}[/latex] = price of natural gas at time [latex]t[/latex]
- P[latex]_t^{coal}[/latex] = price of coal at time [latex]t[/latex]
- P[latex]_t^{oil}[/latex] = price of oil at time [latex]t[/latex]
- Income[latex]_t[/latex] = consumer income at time [latex]t[/latex]
When the utility company is formulating its rate requests, it can use this equation to predict the effect of alternative electricity prices on the quantity of electricity demanded. Of course, it must also have some estimates of the future level of consumer income and the future prices of natural gas, coal, and oil. The quantity of electricity demanded can be predicted by substituting the projected price and income information into equation 6.20.
In a similar manner, government and private economists use multiple regression analysis (and related techniques) to estimate equations (or systems of equations)[21] that may be used to forecast future levels of GDP, the unemployment rate, the inflation rate, and other important macroeconomic variables.
To estimate confidence intervals for forecasts, the variance of the forecast must be determined. The derivation of this forecast variance, however, requires the use of matrix algebra. A formal discussion of the forecast variance may be found in a more advanced text.[22] In general, however, we can note that, ceteris paribus, the variance of the forecast will tend to be smaller when:
- the variance of the error term ([latex]\sigma^2[/latex]) is smaller, and
- there are more observations.
The intuition behind these results is relatively straightforward. If the variance of the error terms is smaller, than the outcomes tend to be more tightly bunched around the regression equation. In this case, the regression equation generates predictions that are, on average, more precise. An increase in the number of observations improves the accuracy of the estimated intercept and slope parameters, thereby improving forecast accuracy (and reducing the variance of the forecast).
6.8.1 Evaluating Forecasts
The accuracy of forecasts is frequently assessed using the root mean square error defined as:
\begin{equation*}
\text{root mean square error}=\sqrt{\frac{1}{T}\overset{T}{\underset{t=1}{%
\sum }}(Y_{t}-\hat{Y}_{t})^{2}}
\end{equation*}
where:
[latex]Y_t[/latex] = observed level of the dependent variable at time [latex]t[/latex]
[latex]\hat{Y}_t[/latex] = predicted level of the dependent variable at time [latex]t[/latex]
[latex]T[/latex] = # of time periods
The root mean square error is frequently used to compare the accuracy of alternative forecasting methods. In most situations, the method that results in the lowest root mean square error is preferred for forecasting purposes.
6.9 Caution: The importance of model specification
As in the case of the bivariate regression model, multiple regression analysis provides a measure of the degree of correlation that exists between a dependent variable and a collection of independent variables. A high degree of correlation, however, does not necessarily imply that a causal relationship exists.
The “specification” of a regression model refers to the process of selecting both the functional form for the regression equation and the mix of variables that are to be included on the right-hand side of the equation. Poorly specified models may result in a high [latex]R^2[/latex], however, even if no causal relationship exists among the variables. If, for example, an important independent variable is omitted from a regression equation, then the estimated coefficients on the included independent variables may be partly capturing the effect of the excluded variable.
The effects of various types of specification error are discussed in detail in Chapter 10. For now, it can be noted that economic theory should be used, whenever possible, to determine the mix of variables that are included as independent variables in a regression equation. Independent variables should be included in models because there are theoretical justifications for their inclusion, and not simply because they are correlated with the independent variable.
Cigarette smoking, suicide, and murder?
Smith, Phillips, and Neaton (1992) use data from a large epidemiological study to examine the relationship between the number of cigarettes smoked each day and the probability of either committing suicide or being murdered. After controlling for other risk factors, they find that individuals who smoke 60 or more cigarettes per day are approximately 3.4 times more likely to commit suicide than individuals who do not smoke. Individuals who smoke 40 or more cigarettes per day have a probability of being murdered that is approximately twice that of nonsmokers.
Is it likely that smoking cigarettes causes individuals to commit suicide or provokes others to murder smokers? Smith, Phillips, and Neaton argue that the observed correlation between cigarette smoking and suicide (or murder) outcomes is more likely the result of the omission of some other variable that is correlated with these variables. This particular study, for example, does not include data on individual income levels. It is quite possible, for example, that low-income individuals are more likely to smoke, commit suicide, and become the victims of violent crime. It is also possible that some personality traits that causes an individual to be more prone to suicide may also lead the individual to indulge in cigarette smoking.
This study provides a good example of the problem associated with confusing correlation with causation.
Summary
In this chapter, the multiple regression model has been introduced and examined. We have seen that, under the assumptions of the classical regression model, the OLS estimators are BLUE. If the error terms are normally distributed, the OLS estimators are also fully efficient (i.e., the estimator for each parameter achieves the lowest variance possible for any unbiased estimator).
In the next chapter, you will examine how hypothesis tests may be performed in the multiple regression model.
Key concepts
- multiple regression
- intercept term
- slope coefficients
- population regression functionsample regression function/autocorrelation
- perfect multicollinearity
- near-perfect multicollinearity
- variance of estimators
- standard error of estimators
- degrees of freedom
- consistency
- linearity
- unbiasedness
- BLUE
- efficiency
- TSS
- RSS
- ESS
- multiple coefficient of determination (R[latex]^2[/latex])
- adjusted R[latex]^2[/latex] = ([latex]\overline{R}^2[/latex])
- root mean square error
Exercises and problems
- Suppose that you wished to examine the determinants of a student’s GPA. What variables might you include in a regression equation?
- Consider the model of electricity demand appearing in equation 6.10.
- Are the reported signs of the coefficients consistent with the predictions of economic analysis? Explain.
- Interpret the meaning of each of the parameters. What, for example, does a coefficient of -0.025 on the P[latex]^{elec}[/latex] variable indicate?
- Consider the estimated regression equation appearing in equation 6.12.
- An individual has scores of 720 and 700 on the math and verbal sections of the SAT exam and graduated in the 90th percentile of their high school class. What is his or her predicted score on the final exam?
- What is the predicted effect of a 10 percentage-point increase in high school class rank on the final exam score (holding SAT scores constant)?
- An econometrician in the market research department of a widget manufacturing firm estimates the relationship:\begin{equation*}\hat{Q}_{i}=37+20\text{TV}_{i}+10\text{NEWS}_{i}\end{equation*}\begin{equation*}(\text{standard errors in parentheses})\end{equation*}
where:
[latex]\hat{Q}_{i}[/latex] = widget sales in week [latex]i[/latex]
TV[latex]_{i}[/latex] = number of television ads placed by the firm in week [latex]i[/latex]
NEWS[latex]_{i}[/latex] = number of newspaper ads placed by the firm in week [latex]i[/latex]
-
- What does this equation tell you about the additional sales resulting from an additional television ad?
- What does this equation tell you about the additional sales resulting from an additional newspaper ad?
- Consider the boxed text example concerning the Preston and Richards study of the relationship between marriage rates and job prospects.
- Discuss the implications of the estimated sign (positive or negative) of each of the slope coefficients. Are these results consistent with the argument that better job prospects for women lower marriage rates?
- Do these results necessarily suggest that better job prospects lower marriage rates? Is it also possible that higher marriage rates reduced labor market activity for women? Explain.
- (More difficult question) In a classic study of the effect of social security on the level of savings, Martin Feldstein (1974) used annual data for the years 1929 – 1971 to estimate several versions of a consumption function. One of these estimated equations is given by:
[latex]\hat{C}_{t}[/latex] = 228+0.530YD[latex]_{t}[/latex]+0.120 YD[latex]_{t}[/latex]+0.356[latex]RE_{t}[/latex]+0.014 W[latex]_{t-1}[/latex]+0.021 SSW[latex]_{t}[/latex]
where:
- [latex]\hat{C}_{t}[/latex] = predicted level of real per capita consumption expenditures in year [latex]t[/latex]
- YD[latex]_{t}[/latex] = real per capita disposable personal income in year [latex]t[/latex]
- YD[latex]_{t-1}[/latex] = real per capita disposable personal income in year [latex]t-1[/latex]
- RE[latex]_{t}[/latex] = real retained earnings (undistributed corporate profit) in year [latex]t[/latex]
- W[latex]_{t-1}[/latex] = real per capita household wealth in year [latex]t-1[/latex]
- SSW[latex]_{t}[/latex] = a measure of the present value of real per capita social security wealth in year [latex]t[/latex]
-
- What is the estimated marginal propensity to consume from current income?
- Why is YD[latex]_{t-1}[/latex] included in addition to YD[latex]_{t}[/latex]? What is the interpretation of the coefficient on this variable? The estimated value of this coefficient is less than the estimated coefficient on YD[latex]_{t}[/latex]. Is this a reasonable result?
- Why are RE[latex]_{t}[/latex] and W[latex]_{t-1}[/latex] included in this equation?
- What are the implications of the positive coefficient associated with the social security wealth variable? What are the implications for the level of saving in the economy?
- Rios (1991) estimates the following equation:
[latex]\widehat{\text{Fert}}_{i}=5.933-0.095\text{GNP}_{i}-0.038\text{HS\%}[/latex]
where:
- Fert[latex]_{i}[/latex] = total fertility rate in country [latex]i[/latex]
- GNP[latex]_{i}[/latex] per capita GNP in country [latex]i[/latex] (in U.S. dollars)
- HS% = percentage of women in high school in country [latex]i[/latex]
(Note: the total fertility rate is an estimate of the average number of children born by a woman over the course of her lifetime.)
-
- What does this estimated equation suggest about whether the quantity of children is a normal or inferior good?
- Why would the percentage of women in high school be included in this equation? Is a negative sign for this coefficient reasonable?
- Use the data in Table 6.1 to:
- Estimate the parameters of the equation: [latex]Y_{i}=\beta _{o}+\beta _{1}X_{1i}+u_{i}[/latex]
- Estimate the parameters of the equation: Y[latex]_{i}=\beta _{o}X_{oi}+\beta _{1}X_{1i}+u_{i}[/latex] (Be sure to omit the constant term when you estimate this equation.)
- Compare the results in (a) and (b). Will this always occur?
| [latex]Y[/latex] | [latex]X_o[/latex] | [latex]X_1[/latex] |
|---|---|---|
| 20 | 1 | 15 |
| 10 | 1 | 20 |
| 18 | 1 | 30 |
| 25 | 1 | 10 |
| 5 | 1 | 11 |
| 15 | 1 | 25 |
| 60 | 1 | 75 |
| 33 | 1 | 21 |
| 70 | 1 | 50 |
| 100 | 1 | 70 |
- (More involved problem) Labor economists have observed that female labor force participation rates increased quite substantially in the years following World War II (this is particularly true for married females).
- What factors might account for these changes in the labor force participation rate for married females? (The January 1985 issue of the Journal of Labor Economics contains an interesting set of articles investigating this topic. You may wish to examine some of the articles in this volume for suggestions.)
- Locate and download time-series data on the female labor force participation rate and the variables that you selected in (a). If you cannot find the exact variables that you described, try to find a close substitute that can serve as a proxy variable.
- Estimate a multiple regression model in which the female labor
force participation rate is the dependent variable.
- In a small town, there are two restaurants: a fast-food hamburger restaurant and a pizzeria. The fast-food hamburger restaurant hires a market researcher to estimate the demand for fast-food hamburgers. The estimated demand equation is:
\begin{equation*}
\text{\^{Q}}_{dt}=500-20\text{P}_{t}^{hamb}+5\text{P}_{t}^{pizza}+.05\text{I}_{t}
\end{equation*}
where:
- [latex]\hat{Q}_{dt}[/latex] = predicted quantity of hamburgers demanded (per day)
- P[latex]_{t}^{hamb}[/latex] = price ($) of hamburgers at time [latex]t[/latex]
- P[latex]_{t}^{pizza}[/latex] = price ($) of pizza at time [latex]t[/latex]
- [latex]I_{t}[/latex] = the level of per capita monthly income ($) in the community at time [latex]t[/latex] Use this estimated demand equation to answer each of the following questions.
-
- Suppose that the price of pizza and income are held constant at P[latex]^{pizza}[/latex]= 10 and I =1000. Draw a graph of the demand curve relating the quantity of hamburgers demanded to the price of hamburgers. In this graph, place quantity demanded on the vertical axis and price on the horizontal. (This is the reverse of the standard textbook treatment, but follows the mathematical convention of placing the dependent variable on the vertical axis.) Is the resulting demand curve upward or downward sloping? Is this result consistent with microeconomic theory. What is the slope of this curve
- Suppose that the price of pizza rises to P[latex]^{pizza}[/latex] = 12. Show how the demand for hamburgers shifts when the price of pizza changes (use the diagram constructed in part (a)). Has the slope of the demand curve changed? Has the intercept changed? Ceteris paribus, what relationship exists between the price of pizza and the quantity of fast-food hamburgers demanded? Does this suggest that pizza and fast-food hamburgers are substitutes or complements?
- An Engel curve is a curve that illustrates the relationship between the quantity of a good demanded and the level of consumer income, holding other factors constant. Draw such a curve for fast-food hamburgers, holding the price of pizza and the price of hamburgers constant at P[latex]^{pizza}[/latex] = 10 and P^{hamb} = 2. Is the Engel curve upward or downward sloping? What does this tell us about the effect of a change in consumer income on the consumption of fast-food hamburgers? Do fast-food hamburgers appear to be a normal good or an inferior good? What is the slope of this Engel curve?
- What happens to this Engel curve if the price of hamburgers rises to P[latex]^{hamb}[/latex] =2.50? Draw this new curve. Has the slope of the Engel curve changed?
- Provide an intuitive explanation of the problem of multicollinearity.
- Suppose that you are hired by a utility company to estimate the parameters of a demand curve for electricity. The estimated equation is to take the form:
\begin{equation*}
\text{Q}_{dt}=\beta _{o}+\beta _{1}\text{P}_{t}^{elec}+\beta _{2}\text{P}%
_{t}^{ng}+\beta _{3}\text{P}_{t}^{coal}+\beta _{4}\text{P}_{t}^{oil}+\beta
_{5}\text{Income}_{t}+u_{t}
\end{equation*}
where each variable is defined as in equation 6.20. The company provides you with 5 years of monthly data on each variable. Would there be any problem in estimating this model if:
- the public service commission regulating rates has maintained a fixed ratio between the prices of electricity and natural gas during this period (i.e., [latex]P_{t}^{elec}=kP_{t}^{ng}[/latex]), or
- the price of electricity has remained constant during this period.
- A student attempts to estimate the following variation of equation 6.12.
\begin{equation*}
\text{Grade}_{i}=\beta _{o}+\beta _{1}\text{SAT-M}_{i}+\beta _{2}\text{SAT-V}_{i}+\beta _{3}\text{SAT-Total}_{i}+u_{i}
\end{equation*}
where:
- SAT-M=SAT math score
- SAT-V=SAT verbal score, and
- SAT-Total=combined SAT score = SAT-M + SAT-V.
-
- Can this model be estimated? Explain.
- Will a similar problem occur when equation 16.2 is estimated? Explain.
- (More difficult question) Consider the regression model given by:
\begin{equation*}
Y_{i}=\beta _{o}+\beta _{1}X_{1i}+\beta _{2}X_{2i}+u_{i}
\end{equation*}- State the normal equations for the OLS estimators.
- Solve these normal equations for the estimated parameters [latex]\hat{\beta}_{o},\hat{\beta}_{1}[/latex], and [latex]\hat{\beta}_{2}[/latex].
- Use the data in the file “final.dat” to verify the sample regression function appearing in equation 6.12.
- Use the data in the file “cons2.dat” to verify the estimated consumption function appearing in equation 6.14.
- Provide a brief intuitive definition of each of the following properties of estimators:
- unbiasedness.
- linearity.
- consistency.
- efficiency.
- B.L.U.E.
- Consider the following multiple regression equation designed to explain differences in crime rates across U.S. states:
Crime[latex]_{i}[/latex] = [latex]\beta _{o}+\beta_{1}[/latex]Pov[latex]_{i}\beta_{2}[/latex]Popdens[latex]_{i}+\beta_{3}[/latex] Metro[latex]_{i}[/latex]+[latex]u_{i}[/latex]
where:
- Crime[latex]_{i}[/latex] = total crime rate in state [latex]i[/latex]
- Pov[latex]_{i}[/latex] = poverty rate in state [latex]i[/latex]
- Popdens[latex]_{i}[/latex] = population per square mile in state [latex]i[/latex]
- Metro[latex]_{i}[/latex] = metropolitan population as % of state population in state [latex]i[/latex]
-
- What predictions can you make about the sign of each of the independent variables in this?
- Estimate the parameters of this equation using the data appearing in the file “crime.dat.”
- Do the estimated signs of the parameters agree with your predictions in (a)?
- What is the value of R[latex]^{2}[/latex] for this equation? Explain the meaning of this statistic.
- What is the value of [latex]\overline{\text{R}}^{2}[/latex]? Is this larger or smaller that R[latex]^{2}[/latex]? Will this always occur?
- Economists use hedonic pricing models to explain price variations in markets in which there is product differentiation. As an example of a hedonic pricing model, consider the following equation describing the determination of new car prices in 2002:
\begin{equation*}
\text{MSRP}_{i}=\beta _{o}+\beta _{1}\text{Horse}_{i}+\beta _{2}\text{Length}_{i}+\beta _{3}\text{Width}_{i}+\beta_{4}\text{Height}_{i}+\beta _{5}\text{Weight}_{i}
\end{equation*}
\begin{equation*}
+\beta _{6}\text{Disp}_{i}+\beta _{7}\text{City}_{i}+\beta _{8}\text{High}_{i}+u_{i}
\end{equation*}
where:
- MSRP_{i}=manufacturer’s suggested retail price for car model [latex]i[/latex] in 2002
- Horse[latex]_{i}[/latex] = horsepower of car model [latex]i[/latex]
- Length[latex]_{i}[/latex] = length of car model [latex]i[/latex] (inches)
- Width[latex]_{i}[/latex] = width of car model [latex]i[/latex] (inches)
- Height[latex]_{i}[/latex] = height of car model [latex]i[/latex] (inches)
- Weight[latex]_{i}[/latex] = weight of car model [latex]i[/latex] (pounds)
- Disp[latex]_{i}[/latex] = engine displacement for car model [latex]i[/latex] (liters)
- City[latex]_{i}[/latex]= EPA city miles per gallon for car model [latex]i[/latex]
- High_{i} = EPA highway miles per gallon
- [latex]u_{i}[/latex] = random error term for observation [latex]i[/latex]
-
- Explain what the slope coefficients in this equation measure.
- Can you predict the signs of any of these coefficients? State your predictions.
- Use the data in the file “cars.dat” to estimate the parameters of this equation.
- Do the estimated parameters match your predictions?
- What is the R[latex]^{2}[/latex] and adjusted R[latex]^{2}[/latex] for this equation? What does the value of R[latex]^{2}[/latex] indicate about the fit of this equation?
- The data file ”lemonade2.dat” provides hypothetical data on the quantity of lemonade sold, the price of lemonade, the average daily temperature, and the price of soda at a concession stand. Consider the equation:
\begin{equation*}
\text{Q}_{d}=\beta _{o}+\beta _{1}\text{P}_{lemonade}+\beta_{2}\text{Temp}+\beta _{3}\text{P}_{soda}+u_{i}
\end{equation*}
-
- Estimate the parameters of this demand equation.
- What is the R[latex]^{2}[/latex] and adjusted R[latex]^{2}[/latex] for this equation? Which is larger? Why?
- Are the values of these estimated coefficients consistent with those predicted by economic theory? Explain.
- Allison (1972) estimates the following equation designed to explain crime rates in the communities within a 40 mile radius of Chicago (only communities with 25,000 or more residents were included):
\begin{equation*}
\widehat{\text{Crime}}_{i}=-5895.0-14.71\text{Distance}_{i}+155.4\text{Pop15-24}_{i}+846.3\text{Un}_{i}
\end{equation*}
\begin{equation*}
+455.3\text{Ed}_{i}-208.9\text{GenderDiff}_{i}-26.0\text{Rec}_{i}
\end{equation*}
\begin{equation*}
\text{R}^{2}=0.78
\end{equation*}
where:
Crime[latex]_{i}[/latex] = Crime rate in community [latex]i[/latex]
Distance[latex]_{i}[/latex] = community }[latex]i'[/latex]s distance from core of city (in miles)
Pop15-24[latex]_{i}[/latex] = percent of the population aged 15-24 in community [latex]i[/latex]
Un[latex]_{i}[/latex] = male unemployment rate in community [latex]i[/latex]
Ed[latex]_{i}[/latex] = mean education of males aged 25+ in community [latex]i[/latex]
GenderDiff[latex]_{i}[/latex] = % female – % male in community [latex]i[/latex]
Rec_{i} = expenditures on parks and recreation (per 1000 population) in community [latex]i[/latex]
-
- What do the signs of each of the slope coefficients suggest about the relationship between the corresponding independent variable and the crime rate? Are these signs reasonable?
- Interpret the meaning of each coefficient (e.g., what does a coefficient of -14.71 imply about the relationship between the crime rate and community [latex]i[/latex]‘s distance in miles from the core of the city?).
- What does an R[latex]^{2}[/latex] equal to 0.78 indicate?
- This article does not report either an adjusted R[latex]^{2}[/latex] or the number of observations. Why might a reader wish to know the number of observations and the adjusted R[latex]^{2}[/latex] for this equation?
- A student estimates the parameters of the following regression model:
\begin{equation*}
YD_{t}=\beta _{o}+\beta _{1}C_{t}+\beta _{2}S_{t}+u_{t}
\end{equation*}where:
- YD[latex]_{t}[/latex] = disposable personal income in year [latex]t[/latex]
- C[latex]_{t}[/latex] = consumption expenditures in year [latex]t[/latex]
- S[latex]_{t}[/latex] = savings in year [latex]t[/latex]
and finds that the R[latex]^{2}[/latex] for this equation is equal to 1.0. Does this indicate that this is a very useful model? Explain.
- Consider the following model designed to explain the level of life
expectancy at birth in different countries:
\begin{equation*}
\text{LifeEx}_{i}=\beta _{o}+\beta _{1}\text{TV}_{i}+\beta _{2}\left( \frac{\text{Pop}}{\text{Doctor}}\right) _{i}+\beta _{3}\text{GDP}_{i}+u_{i}
\end{equation*}%where:
- LifeEx[latex]_{i}[/latex] = Life expectancy at birth in country [latex]i[/latex]
- TV[latex]_{i}[/latex]= TV sets per 100 people in country [latex]i[/latex]
- [latex]\left( \frac{\text{Pop}}{\text{Doctor}}\right)_{i}[/latex] = population per
doctor in country [latex]i [/latex] - GDP[latex]_{i}[/latex] = real per capita GDP in country [latex]i[/latex] (in U.S. dollars}
-
- What is the expected sign of each of the variables above?
- Estimate the parameters of this equation using the data in the data file “life.dat.” Do the signs agree with expectations?
- Is it really likely that lifespans are affected by owning a TV? What might this variable be capturing?
6.13 Mathematical Appendix
6.13.1 Derivation of OLS estimators
The OLS estimator are derived by finding the values of [latex]\hat{\beta}_{o},\hat{\beta}_{1},\ldots ,\hat{\beta}_{k}[/latex] that solve the following problem:
\begin{equation*}
\underset{\hat{\beta}_{o},\hat{\beta}_{1},\ldots ,\hat{\beta}_{k}}{\text{minimize}}\text{: }\sum \hat{u}_{i}^{2}=\sum (Y_{i}-\hat{\beta}_{o}-\hat{\beta}_{1}X_{1i}-\cdots -\hat{\beta}_{k}X_{ki})^{2}
\end{equation*}
The first-order conditions for this minimization problem require that the following partial derivatives are all equal to zero:
\begin{equation*}
\frac{\partial \left( \sum \hat{u}_{i}^{2}\right) }{\partial \hat{\beta}_{o}}=-2\sum (Y_{i}-\hat{\beta}_{o}-\hat{\beta}_{1}X_{1i}-\cdots -\hat{\beta}_{k}X_{ki})=0
\end{equation*}
\begin{equation*}
\frac{\partial \left( \sum \hat{u}_{i}^{2}\right) }{\partial \hat{\beta}_{1}}=-2\sum X_{1}(Y_{i}-\hat{\beta}_{o}-\hat{\beta}_{1}X_{1i}-\cdots -\hat{\beta}_{k}X_{ki})=0
\end{equation*}
\begin{equation*}
\frac{\partial \left( \sum \hat{u}_{i}^{2}\right) }{\partial \hat{\beta}_{2}}=-2\sum X_{2}(Y_{i}-\hat{\beta}_{o}-\hat{\beta}_{1}X_{1i}-\cdots -\hat{\beta}_{k}X_{ki})=0
\end{equation*}
\begin{equation*}
\vdots
\end{equation*}
\begin{equation*}
\frac{\partial \left( \sum \hat{u}_{i}^{2}\right) }{\partial \hat{\beta}_{k}}%
=-2\sum X_{k}(Y_{i}-\hat{\beta}_{o}-\hat{\beta}_{1}X_{1i}-\cdots -\hat{\beta}%
_{k}X_{ki})=0
\end{equation*}
Since the sample residual, \hat{u}_{i} can be expressed as:
\begin{equation*}
\hat{u}_{i}=Y_{i}-\hat{\beta}_{o}-\hat{\beta}_{1}X_{1i}-\cdots -\hat{\beta}%
_{k}X_{ki}
\end{equation*}
these first-order conditions can be expressed as:
\begin{equation*}
\sum \hat{u}_{i}=0
\end{equation*}
\begin{equation*}
\sum X_{1}\hat{u}_{i}=0
\end{equation*}
\begin{equation*}
\sum X_{2}\hat{u}_{i}=0
\end{equation*}
\begin{equation*}
\vdots
\end{equation*}
\begin{equation*}
\sum X_{k}\hat{u}_{i}=0
\end{equation*}
These [latex]k+1[/latex] equations constitute a set of normal equations that can be solved for the [latex]k+1[/latex] estimated parameters [latex]\hat{\beta}_{o}, \hat{\beta}_{1},\ldots ,\hat{\beta}_{k}[/latex].[23]
- The mathematically sophisticated reader will note that when there is a single independent variable ([latex]k=1[/latex]), the regression equation can be represented by a line in two-dimensional space. When there are two independent variables ([latex]k=2[/latex]), the regression equation is the equation of a plane in three-dimensional space. If three or more independent variables are present, the regression equation provides the equation for a hyperplane (a generalization of a plane to more than three dimensions). ↵
- Those who have studied multivariate calculus should note that each slope coefficient, [latex]\beta _{j}[/latex], is equal to the partial derivative of the dependent variable with respect to the [latex]j[/latex]th independent variable. In mathematical terms: [latex] \begin{equation} \beta _{j}=\frac{\partial Y_{i}}{\partial X_{ji}} \end{equation} [/latex] ↵
- Recall that when two variables are independent, the covariance between these variables equals zero. (Be sure to remember, though, that zero covariance between two variables does not necessarily imply independence.) ↵
- If [latex]c[/latex] is negative, the effect is essentially the same. The only difference is that an inverse relationship would exist between [latex]X[/latex] and [latex]Z[/latex]. ↵
- This model is specified in a simple form to focus on the issue of multicollinearity. A more complete specification of travel expenditures would include other variables such as: the number of children, the age of household members, prices of alternative modes of travel, and other variables that affect the cost or benefits associated with travel-related activities. ↵
- See Goldberger (1991), pp. 245-250, for an entertaining discussion of this argument. ↵
- This minimization problem is examined (using calculus) in the mathematical appendix at the end of this chapter. ↵
- This approach involves the use of method of moment estimators. Under this approach estimates of population parameters are derived by setting sample moments equal to their expected values. In this case, the resulting estimators are equivalent to the OLS estimators. A proof of this (requiring matrix algebra) may be found in Johnston (1984), pp. 171-2. ↵
- The data used to estimate this model may be found in the file ``final.dat.'' ↵
- The real interest rate equals the nominal interest rate minus the inflation rate. The interest rate on 3-month Treasury bills is often used as a measure of the nominal interest rate since these financial instruments are essentially risk-free financial assets. ↵
- The data used to estimate this equation appears in the file ``cons2.dat.'' ↵
- For a more complete discussion, see: Greene (2000), pp. 247-8; or Johnston (1984), pp. 171-173. ↵
- The proof of each of these properties requires the use of mathematical tools that are beyond the scope of this text. Proofs may be found in Greene (2000), Johnston (1984), or other more advanced texts. ↵
- More formally, the consistency property requires that: \begin{equation*} \underset{N\rightarrow \infty }{\lim }\text{Prob}\left( \left| \hat{\beta}_{j}^{N}-\beta _{j}\right| >\epsilon \right) =0\text{, for any }\epsilon >0. \end{equation*} where [latex]\beta_j^N[/latex] is the OLS estimator of the parameter [latex]\beta_i[/latex] when a sample of size [latex]N [/latex]is used. ↵
- Once again, it should be noted that nonlinear estimators may have a variance that is lower than that of the OLS estimators. ↵
- Biased estimators, however, may have a lower mean-square error (where the mean-squared error is defined as: \sum \left( Y_{i}-\hat{Y}_{i}\right) ^{2}). ↵
- The adjusted [latex]R^2[/latex] is formed from the earlier definition by dividing both RSS and TSS by their respective degrees of freedom. The degrees of freedom for these variables is discussed in more detail in Chapter 7. ↵
- In fact, [latex]\bar{R}^2[/latex] may be negative in small samples when [latex]R^2[/latex] is relatively low. ↵
- A copy of this data appears in the election.dat file on the website accompanying this text. This data is described in Table b.13 in Appendix B. ↵
- Of course, it would also be desirable to perform a hypothesis test to determine whether these additional variables have a significant effect on voting outcomes. This topic will be addressed in Chapter 7. ↵
- Procedures for estimating systems of equations are discussed in Chapter 15. ↵
- See, for example, the discussion in Johnston (1984), pp. 195-198. ↵
- A unique solution for this system of equations exists only if none of these equations can be written as a linear combination of the other equations. This will be true as long as multicollinearity does not occur among the[latex] X_{j}[/latex]'s. If multicollinearity is present, however, then it is not possible to derive unique solutions for the estimated intercept and slope parameters from these normal equations. ↵