Chapter 12 – Autocorrelation

johnkane

Chapter 12 – Autocorrelation

In previous chapters it was assumed that the residuals in the regression model were uncorrelated across observations. In mathematical terms, this condition can be stated as:
\begin{equation*}
E(u_t u_s)=0\text{ (for }t\neq s\text{)}
\end{equation*}
In time series models, however, residuals are often correlated across observations. When this occurs, autocorrelation (also known as serial correlation) is said to be present.

This chapter begins with a discussion of the effects of autocorrelation. Several methods for detecting the presence of autocorrelation are discussed and analyzed. Estimation procedures are then presented that are appropriate for models exhibiting autocorrelated error terms.

12.1 First-order autocorrelation

Consider the regression model:
\begin{equation}
Y_t=\beta _0+\beta _1X_{1t}+\beta _2X_{2t}+\ldots +\beta _kX_{kt}+u_t \tag{12.1}
\end{equation}
A simple form of autocorrelation occurs when the residual term in this equation can be expressed as:
\begin{equation}
u_t=\rho u_{t-1}+\epsilon _t \tag{12.2}
\end{equation}
\begin{equation*}
\text{where: }E(\epsilon _t\epsilon _s)=0\text{ (for }t\neq s\text{)} \end{equation*}
\begin{equation*}
E(\epsilon _t)=0
\end{equation*}
\begin{equation*}
E(\epsilon _t^2)=\sigma _\epsilon ^2
\end{equation*}
\begin{equation*}
\left| \rho \right| <1
\end{equation*}
Residuals satisfying these conditions are said to follow a first-order autoregressive process. The term “autoregressive” indicates that the current value of a variable is assumed to be a linear function of one or more lagged values of itself. This process is said to be a “first-order” autoregressive process because the current value of the error term is assumed to be directly affected by only the previous period’s error term.^[1] An error process of this type is often simply referred to as an AR(1) error process. In this specification, [latex]\epsilon _t[/latex] is an error term possessing a constant variance that is uncorrelated with past values of itself. When an error process possessing a constant mean and variance is uncorrelated across time periods, it is said to be a white noise error process. Thus, in the specification above, [latex]\epsilon _t[/latex] is assumed to be a white noise error process.

The coefficient [latex]\rho[/latex] is a measure of the degree of serial correlation. If [latex]\rho[/latex] is zero, autocorrelation is not present. If the absolute value of [latex]\rho[/latex] is close to one, then the previous period’s error term has a substantial effect on the magnitude of the current error term. Positive first-order autocorrelation occurs when [latex]\rho[/latex] is greater than zero. A negative value for [latex]\rho[/latex] indicates that negative first-order autocorrelation is present. Figures 12.1 and 12.2 illustrate possible patterns of residuals under positive and negative first-order autocorrelation. When positive first-order autocorrelation occurs, positive residuals tend to be followed by other positive residuals, and negative residuals tend to be followed by other negative residuals. If negative first-order autocorrelation is present, the residuals tend to alternate in sign. Under negative first-order autocorrelation, positive residuals tend to be followed by negative residuals while negative residuals tend to be followed by positive residuals.

INSERT FIGURE 12.1

INSERt Figure 12.2

In time-series applications, positive autocorrelation is much more common than negative autocorrelation. Positive autocorrelation is often the result of unobservable shocks that generate an effect that persists for more than one time period. It may also result from changes in an unobservable variable that persist for more than one period. Suppose, for example, that there is an unobservable change in expectations that occurs at the end of December and persists through the first quarter of the following year. If annual time series data is used to estimate an investment demand equation, it is quite likely that the effect of shocks of this sort would result in error terms that exhibit positive first-order autocorrelation.

Negative first-order autocorrelation occurs when “above average” outcomes tend to be followed by “below average” outcomes.^[2] This may occur, for example, in the estimation of auto sales equations in which exceptionally high sales in one year may result in a smaller potential market in the succeeding year. In a similar manner, an exceptionally low level of auto sales in one year may result in a higher than usual sales volume in the succeeding year (since more cars have worn out).

It should also be noted that model misspecification may result in a pattern of sample residuals that resembles a first-order autoregressive process. For example, suppose that the true relationship is given by: \begin{equation}
Y_{t}=\beta _{0}+\beta _{1}X_{1t}+\beta _{2}X_{2t}+u_{t} \tag{12.3} \end{equation}
but an econometrician incorrectly estimates a model of the form: \begin{equation}
Y_{t}=\gamma _{0}+\gamma _{1}X_{1t}+v_{t} \tag{12.4} \end{equation}
The residual in equation 12.4, [latex]v_{t}[/latex], can be expressed as: \begin{equation*}
v_{t}=\beta _{2}X_{2t}+u_{t}
\end{equation*}
If the variable [latex]X_{2t}[/latex] exhibits cyclical fluctuations, the error process, [latex]v_{t}[/latex], will exhibit a pattern that resembles that in Figure 12.1 even if the error term in the correctly specified equation ([latex]u_{t}[/latex]) is uncorrelated across time periods.

An incorrect choice of functional form may also result in a residual pattern that resembles a first-order autoregressive process. For example, suppose that the true relationship is given by a quadratic relationship of the form: \begin{equation}
Y_{t}=\beta _{0}+\beta _{1}X_{t}+\beta _{2}X_{t}^{2}+u_{t} \tag{12.5} \end{equation}
An analyst, however, mistakenly specifies a linear model as: \begin{equation}
Y_{t}=\alpha _{o}+\alpha _{1}X_{t}+v_{t} \tag{12.6} \end{equation}
Figure 12.3 contains a scattergram and an estimated form of equation 12.6 As this diagram illustrates, successive residuals most often share the same sign in this case. In this case, tests for the presence of first-order autocorrelation would suggest the presence of a positive first-order autocorrelation process.

INSERT FIGURE 12.3

Thus, when an autocorrelation problem is suspected, an econometrician should always first carefully examine the model specification. An incorrectly specified model may result in a finding of apparent autocorrelation even though no autocorrelation exists in the correctly specified model. In the discussion below, it is assumed that autocorrelation is entirely the result of correlation in the error process, and is not due to specification error. Initially, it is also assumed that the regression does not contain a lagged dependent variable as a regressor. The consequences of including a lagged dependent variable are addressed in Section 12.3.

Capital punishment and the murder rate

While much of the discussion concerning capital punishment focuses on the moral and ethical issues associated with its use, there is also substantial debate concerning the deterrent effect associated with this penalty. Advocates of the death penalty argue that the existence of this penalty deters others from commiting capital crimes; opponents suggest that such an effect is relatively small.

In a classic econometric study, Ehrlich (1975) uses time-series evidence to address this issue. Ehrlich suggests that the decision to commit a crime is affected by the magnitude of the expected penalty. In the case of capital crimes, the expected penalty is affected by the probability of being arrested, the probability of a conviction given the arrest, and the probability of receiving the death penalty given that a conviction has occurred. Ehrlich estimates a murder supply function as a function of these factors. His results suggest that an additional execution each year results in approximately 7 or 8 fewer murders each year.

The error terms in the murder supply function capture the effects of unobserved factors that influence the murder rate. Unobserved factors that might affect the murder rate include %individuals’ subjective estimates of the probabilities of detection, arrest, and conviction by those who are contemplating engaging in criminal activity. It is likely that these perceptions may generate error terms that are correlated over time. The occurrence of “copycat” crimes is also likely to result in errors that are correlated across time. For these reasons, the error terms in Ehrlich’s model were assumed to follow a first-order autoregressive process.

12.1.1 Consequences of first-order autocorrelation

The Gauss-Markov Theorem, discussed in Chapter 6, states that the OLS estimators are BLUE (Best Linear Unbiased Estimators) when all of the assumptions of the classical regression model are satisfied. When first-order autocorrelation is present, however, one of these assumptions is violated. In particular, recall assumption 6.4:

Assumption 6.4: The error terms are independent across observations (i.e., [latex]u_i[/latex] is independent of [latex]u_j[/latex] when [latex]i\neq j[/latex]).

This condition is violated when first-order autocorrelation is present. Under a first-order autoregressive process, the error term, [latex]u_t[/latex] can be expressed as:
\begin{equation} \tag{12.7}
u_t=\rho u_{t-1}+\epsilon _t
\end{equation}
where [latex]\epsilon _t[/latex] is a white-noise error process. The covariance between [latex]u_t[/latex] and [latex]u_{t-1}[/latex] is given by:
\begin{equation*}
E(u_tu_{t-1})=E\left( \left( \rho u_{t-1}+\epsilon _t\right) \left( u_{t-1}\right) \right)
\end{equation*}
\begin{equation*}
=E(\rho u_{t-1}^2+u_{t-1}\epsilon _t)
\end{equation*}
Since [latex]\epsilon _t[/latex] is independent of past error terms, this reduces to: \begin{equation*}
E(u_tu_{t-1})=E(\rho u_{t-1}^2)
\end{equation*}
\begin{equation} \tag{12.8}
=\rho \sigma _u^2
\end{equation}
\begin{equation*}
\text{where }\sigma _u^2\text{ is the variance of }u_t \end{equation*}
Thus, the presence of first-order autocorrelation results in a nonzero covariance between [latex]u_t[/latex] and [latex]u_{t-1}[/latex]. Using the result in 12.8 , the correlation between [latex]u_t[/latex] and [latex]u_{t-1}[/latex] can be expressed as: \begin{equation}
\frac{E(u_tu_{t-1})}{\sqrt{E(u_t^2)E(u_{t-1}^2)}}=\frac{\rho \sigma _u^2}{\sigma _u^2}=\rho \tag{12.9}
\end{equation}
Thus, the first-order autocorrelation coefficient, [latex]\rho[/latex] , is equal to the correlation coefficient between [latex]u_t[/latex] and [latex]u_{t-1}[/latex]. When first-order autocorrelation is present, the correlation between [latex]u_t[/latex] and [latex]u_{t-1}[/latex] is nonzero and the Gauss-Markov theorem no longer holds.

OLS estimators, however, are still linear, unbiased and consistent when autocorrelation is present.^[3] Unfortunately, OLS estimators are no longer the best linear unbiased estimators in this case. In other words, these estimators are not efficient when the residuals are subject to autocorrelation. Other unbiased estimators exist that have a lower variance than the OLS estimators. Furthermore, the estimated variance of the residuals under the OLS procedure is subject to a bias. Under most circumstances, estimated OLS standard errors tend to understate the true standard errors for the estimators. Let’s examine the reasons for this bias.

Consider a simple bivariate regression relationship in which the error terms follow a first-order autoregressive error process:
\begin{equation}
Y_{t}=\beta _{0}+\beta _{1}X_{t}+u_{t} \tag{12.10} \end{equation}
\begin{equation*}
u_{t}=\rho u_{t-1}+\epsilon _{t}
\end{equation*}
Positive first-order autocorrelation ([latex]\rho >0[/latex]) is much more common than negative first-order autocorrelation in empirical studies. Most economic time-series variables tend to exhibit a significant trend component (many aggregate macroeconomic variables, for example, tend to grow over time). Thus, it will be initially assumed that [latex]\rho[/latex] is greater than zero and that [latex]X_{t}[/latex] grows over time.

Figure 12.4 illustrates a set of possible outcomes from the process described in equation 12.10. In this example, it is assumed that the first error term is negative. Under a positive first-order autoregressive process, negative errors will tend to be followed by other negative error terms. Once a positive error term occurs, however, other positive error terms tend to follow. In the example appearing in Figure 12.4, a cluster of negative error terms occurs for low values of [latex]X_{t}[/latex] and a cluster of positive error terms occurs for high values of [latex]X_{t}[/latex]. When an OLS estimation procedure is performed using this data, the fitted equation will minimize the sum of squared sample error terms. The sample error terms are equal to the vertical distance between the observed data points and the estimated relationship (the sample regression function) while the population error terms are equal to the vertical distance between the observed data points and the true relationship (the population regression function). As this diagram indicates, the sum of squared sample error terms will be less or equal to the sum of squared population error terms. In mathematical terms, this is expressed as:
\begin{equation}
\sum_{t=1}^{T}\hat{u}_{t}^{2}\leq \sum_{t=1}^{T}u_{t}^{2} \tag{12.11}
\end{equation}
This means that, under the conditions stated above, the sample variance of the error terms will tend to understate the true variance of the error terms.

INSERT FIGURE 12.4

As noted in Chapter 7 the standard errors of the estimated intercept and slope parameters are smaller when the variance of the error term is smaller. Since the estimated variance of the error terms is typically understated, so will the standard errors of the intercept and slope parameters. Thus, [latex]t[/latex]-statistics based on these understated standard errors will tend to be overstated (since the denominator is biased downward). For the same reason, [latex]F[/latex]-statistics testing the joint significance of sets of independent variables and the [latex]R^{2}[/latex] for the regression will generally be overstated.

If the independent variables are not positively correlated across time or the error term exhibits negative first-order autocorrelation, the direction of the bias in the [latex]t[/latex]-ratios cannot be as easily determined. When this occurs, however, the [latex]t[/latex]-ratios and [latex]F[/latex]-statistics remain biased.^[4] Since the OLS estimators are unbiased and consistent, forecasts based upon these estimates are also unbiased and consistent. Since the OLS estimators are not efficient, however, it is possible to construct forecasts that have a lower prediction variance than that achieved by forecasts based upon OLS estimators.

In summary, when first-order autocorrelation is present, an application of the OLS estimation procedure results in:

unbiased and consistent estimates of intercept and slope parameters,
inefficient parameter estimates, and biased estimates of the standard errors (and [latex]t[/latex]-ratios) for estimated intercept and slope coefficients (this bias generally tends to inflate [latex]t[/latex]-ratios).
forecasts based upon OLS estimators are unbiased and consistent, but inefficient.

Since autocorrelation has fairly serious consequences, it is important that econometricians be able to diagnose and correct for the presence of autocorrelated errors.

12.1.2 Detection: Durbin-Watson test

Of course, it is possible to get a rough feel for the presence of autocorrelation by examining a scatterplot of regression residuals. This procedure, however, provides no formal test for the presence of autocorrelation. Nor does it provide an estimate of the magnitude of the coefficient [latex]\rho[/latex].

The primary tool used by econometricians to detect the presence of first-order autocorrelation econometricians is the Durbin-Watson test.^[5] This test not only provides a formal test for the presence of first-order autocorrelation, but also provides an initial estimate of the first-order autocorrelation coefficient ([latex]\rho[/latex]). It should be noted, however, that the Durbin-Watson test described below is appropriate only when the regression equation does not contain a lagged dependent variable.

The Durbin-Watson test relies on the use of the Durbin-Watson statistic, [latex]d[/latex] , defined as:
\begin{equation}
d=\frac{\sum\limits_{t=2}^{N}(\hat{u}_{t}-\hat{u}_{t-1})^{2}}{\sum\limits_{t=1}^{N}\hat{u}_{t}^{2}} \tag{12.12} \end{equation}
By expanding the numerator term, the Durbin-Watson statistic can be restated as:
\begin{equation*}
d=\frac{\sum\limits_{t=2}^{N}\hat{u}_{t}^{2}}{\sum\limits_{t=1}^{N}\hat{u}_{t}^{2}}+\frac{\sum\limits_{t=2}^{N}\hat{u}_{t-1}^{2}}{\sum\limits_{t=1}^{N}\hat{u}_{t}^{2}}-2\frac{\sum\limits_{t=2}^{N}\hat{u}_{t}\hat{u}_{t-1}}{\sum\limits_{t=1}^{N}\hat{u}_{t}^{2}} \end{equation*}
For large values of [latex]N[/latex], the first two terms in this summation are approximately equal to one. Thus, the Durbin-Watson statistic can be expressed as:
\begin{equation}
d\approx 2(1-\hat{\rho}) \tag{12.13}
\end{equation}
where [latex]\hat{\rho}[/latex] is an estimator for the first-order autocorrelation coefficient ([latex]\rho )[/latex] defined as:
\begin{equation*}
\hat{\rho}=\frac{\sum\limits_{t=2}^{N}\hat{u}_{t}\hat{u}_{t-1}}{ \sum\limits_{t=1}^{N}\hat{u}_{t}^{2}}
\end{equation*}
An inspection of equation 12.13 indicates that the Durbin-Watson statistic is approximately equal to 2 when no first-order autocorrelation is present ([latex]\rho =0[/latex]). The Durbin-Watson statistic will tend to be less than 2 when there is positive first-order autocorrelation ( [latex]\rho >0[/latex]), and is expected to be greater than 2 when negative first-order autocorrelation is present ( [latex]\rho <0[/latex]). Since [latex]\hat \rho[/latex] is the sample correlation between [latex]\hat u_t[/latex] and [latex]\hat u_{t-1}[/latex], the value of [latex]\hat \rho[/latex] lies between -1 and 1. An inspection of equation 12.13 indicates that the value of the Durbin-Watson statistic must lie between 0 and 4. In particular:

a Durbin-Watson statistic that is close to zero generally indicates that positive first-order autocorrelation is present and the value of [latex]\rho[/latex] is close to 1; and
a Durbin-Watson statistic that is close to four generally indicates that negative first-order autocorrelation is present and the value of [latex]\rho[/latex] is close to -1.

Suppose that you wished to test for the presence of first-order autocorrelation. The appropriate hypotheses are:
\begin{equation*}
\text{H}_{0}\text{: }\rho =0
\end{equation*}
and
\begin{equation*}
\text{H}_{1}\text{: }\rho \neq 0
\end{equation*}
Unfortunately, the Durbin-Watson statistic does not follow an exact small-sample distribution. Therefore, there is an uncertainty region in the distribution of this statistic in which it is not clear whether the null hypothesis should be accepted or rejected. Figure 12.5 illustrates the acceptance and rejection regions for the Durbin-Watson statistic.

INSERT FIGURE 12-5

Durbin and Watson compiled values of the critical values of this statistic for numerous values of [latex]N[/latex] and [latex]k[/latex] where [latex]N[/latex] is the sample size and [latex]k[/latex] is the number of slope coefficients included in the regression. Savin and White extended the usefulness of the Durbin-Watson test by computing the critical values of the Durbin-Watson statistic for a wider range of [latex]k[/latex] and [latex]N[/latex]. A copy of critical values of the Durbin-Watson statistic (as compiled by Savin and White) appears in Appendix A located at the end of this text.

Most regression packages automatically generate a Durbin-Watson statistic whenever a regression is performed. If the Durbin-Watson statistic lies between [latex]d_U[/latex] and [latex]4-d_U[/latex] then no significant autocorrelation is detected by this test. In this case, OLS estimation is appropriate as long as all of the other assumptions of the classical regression model are satisfied. If the value of the Durbin-Watson statistic falls either below [latex]d_L[/latex] or above [latex]4-d_L[/latex] then it can be concluded that first-order autocorrelation is present. When this occurs, a correction for autocorrelation is appropriate. Possible corrective techniques are discussed below.

When the value of the Durbin-Watson statistic falls in either of the uncertainty regions, the Durbin-Watson test provides an ambiguous outcome. In practice, most econometricians argue that if there is doubt about the presence of autocorrelation, it is safest to assume that autocorrelation is present and to apply corrective techniques.

12.1.3 Caution: Durbin-Watson statistics in cross-section analyses

Most regression packages report values of the Durbin-Watson statistic whenever a regression is performed. The Durbin-Watson statistic is used to test for first-order autocorrelation for data that is measured at different points in time. While some econometric packages report the Durbin-Watson statistic whenever a regression model is estimated, this statistic is not generally relevant when cross-sectional data is analyzed.

In cross-sectional analyses, the Durbin-Watson statistic should be close to 2 since consecutive error terms are usually expected to be uncorrelated. Suppose, however, that you have used cross-sectional data to estimate the parameters of a regression model and notice that the Durbin-Watson statistic is relatively small. This suggests that there may be a positive correlation between error terms in consecutive observations. While this is obviously not the result of autocorrelation, it may be due to some form of model misspecification. For example, consider the possibility of regional effects that occur when cross-sectional data on countries, states, counties or individuals are used as the unit of analysis. It is quite common for such data to be sorted by geographical region. Positive correlation between successive error terms in cases such as this suggests that there may be some factor that has not been fully taken into account in the regression model. In this case, the regression model should be reformulated (perhaps by including regional dummy variables).

When working with time-series data, it is standard practice to report the Durbin-Watson statistic. Do not, however, report this statistic if you are working with cross-sectional data.

12.1.4 AR(1) correction: known [latex]\rho[/latex]

As noted above, the general form for a regression model in which the residual follows a first-order autoregressive process is given by: \begin{equation}
Y_t=\beta _0+\beta _1X_{1t}+\beta _2X_{2t}+\ldots +\beta _kX_{kt}+u_t \tag{12.14}
\end{equation}
\begin{equation*}
\text{where: }u_t=\rho u_{t-1}+\epsilon _t
\end{equation*}
Suppose that the value of [latex]\rho[/latex] is known for this regression model. In this case, it is possible to use a simple procedure to transform equation 12.14 into a specification that satisfies the conditions of the classical regression model. Let’s examine this procedure.

Since equation 12.14 is assumed to hold for all observations, it will also hold in period [latex]t-1[/latex]:
\begin{equation}
Y_{t-1}=\beta _0+\beta _1X_{1t-1}+\beta _2X_{2t-1}+\ldots +\beta _kX_{kt-1}+u_{t-1} \tag{12.15}
\end{equation}
Multiplying equation 12.15 by [latex]-\rho[/latex] results in: \begin{equation}
-\rho Y_{t-1}=-\rho \beta _0-\beta _1\rho X_{1t-1}-\beta _2\rho X_{2t-1}-\ldots -\beta _k\rho X_{kt-1}-\rho u_{t-1} \tag{12.16} \end{equation}
Adding equations 12.15 and 12.16 results in: \begin{equation}
Y_t-\rho Y_{t-1}=\beta _0(1-\rho )+\beta _1(X_{1t}-\rho X_{1t-1})+\beta _2(X_{2t}-\rho X_{2t-1}) \tag{12.17}
\end{equation}
\begin{equation*}
+\ldots +\beta _k(X_{kt}-\rho X_{kt-1})+(u_t-\rho u_{t-1}) \end{equation*}
An inspection of equation 12.17 reveals that the error term for this equation ([latex]u_t-\rho u_{t-1}[/latex]) is equal to [latex]\epsilon _t[/latex]. Using the transformations:

\begin{equation*} \widetilde{Y}_t=Y_t-\rho Y_{t-1}\end{equation*}
\begin{equation*} \widetilde{\beta }_0=\beta _0(1-\rho )\end{equation*}
\begin{equation*} \widetilde{X}_{1t}=X_{1t}-\rho X_{1t-1}\end{equation*}
\begin{equation*} \widetilde{X}_{2t}=X_{2t}-\rho X_{2t-1}\end{equation*}
\begin{equation*} \vdots \end{equation*}
\begin{equation*} \widetilde{X}_{kt}=X_{kt}-\rho X_{kt-1}\end{equation*}

equation 12.17 may be restated as:
\begin{equation}
\widetilde{Y}_t=\widetilde{\beta }_0+\beta _1\widetilde{X}_{1t}+\beta _2 \widetilde{X}_{2t}+\ldots +\beta _k\widetilde{X}_{kt}+\epsilon _t \tag{12.18}
\end{equation}
The dependent variable and each of the independent variables in equation 12.18 has been created through a process called “quasi-differencing.”^[6] Note that one observation is lost when the variables are quasi-differenced (since the quasi-differenced variables can only be defined for observations 2 through [latex]N[/latex]).

Since [latex]\epsilon _t[/latex] is uncorrelated across observations, the assumptions of the classical regression model are satisfied for the transformed model appearing in equation 12.18. Therefore, an OLS estimation procedure may be used to estimate the parameters [latex]\tilde \beta _0,\beta _1,\beta _2\ldots ,\beta _k[/latex]. The OLS estimators for these parameters are unbiased and consistent. Since this equation satisfies the conditions of the classical regression model, the standard errors generated as part of the regression procedure may be used to construct hypothesis tests involving the intercept and slope parameters.

One common special case of this procedure occurs when the value of [latex]\rho[/latex] equals one. When this occurs, the error process is said to possess a [latex]unit root[/latex]. An error processes of this sort is called a random walk. (This topic is discussed in more detail in Chapter 17.) In this special case, equation 12.18 reduces to: \begin{equation}
\Delta Y_t=\beta _1\Delta X_{1t}+\beta _2\Delta X_{2t}+\ldots +\beta _k\Delta X_{kt}+\epsilon _t \tag{12.19}
\end{equation}
The variables in equation 12.19 are said to be transformed through the use of differencing. In this equation, the dependent and independent variables are defined as:

\begin{equation*}
\widetilde{Y}_t = Y_t – \rho Y_{t-1}
\end{equation*}

\begin{equation*}
\widetilde{\beta}_0 = \beta_0(1 – \rho)
\end{equation*}

\begin{equation*}
\widetilde{X}_{1t} = X_{1t} – \rho X_{1t-1}
\end{equation*}

\begin{equation*}
\widetilde{X}_{2t} = X_{2t} – \rho X_{2t-1}
\end{equation*}

\begin{equation*}
\vdots
\end{equation*}

\begin{equation*}
\widetilde{X}_{kt} = X_{kt} – \rho X_{kt-1}
\end{equation*}

This process is an example of a generalized least squares (or GLS) estimation procedure. A GLS estimation procedure is often used when a regression model that violates an assumption of the classical regression can be transformed so that all of the assumptions are satisfied for the transformed model. Under a GLS estimation process:

the dependent and independent variables are transformed so that the conditions of the classical regression model are satisfied for the transformed model, and
an OLS estimation procedure is applied to the transformed variables.

In this case, the GLS estimation procedure involves the use of an OLS estimation procedure applied to quasi-differenced (or differenced) dependent and independent variables. As will be discussed in Chapter 13, a GLS estimation technique is also commonly used to correct for the presence of heteroskedasticity.

Unfortunately, however, the value of [latex]\rho[/latex] is not generally known by the researcher when econometric models are estimated. Thus, it is necessary to derive an estimator for [latex]\rho[/latex] before the GLS procedure described above can be applied. In the discussion below, it is assumed that [latex]\left| \rho \right| <1[/latex]. If the use of one of the following estimation procedures results in an estimated value of [latex]\rho[/latex] that is close to one, it may be desirable to test for the presence of a unit root. The Dickey-Fuller test discussed in the appendix at the end of this chapter may be used for this purpose.

12.1.5 AR(1) correction: unknown [latex]\rho[/latex]

Cochrane-Orcutt procedure

One of the first popular methods of correcting for first-order autocorrelation was developed by Cochrane and Orcutt.S^[7] The Cochrane-Orcutt procedure involves the use of an iterated GLS estimator that relies, in part, on the quasi-differencing procedure described in the preceding section. Let’s examine the Cochrane-Orcutt procedure.

Once again, the regression model is given by:
\begin{equation}
Y_{t}=\beta _{0}+\beta _{1}X_{1t}+\beta _{2}X_{2t}+\ldots +\beta _{k}X_{kt}+u_{t} \tag{12.20}
\end{equation}
\begin{equation*}
\text{where: }u_{t}=\rho u_{t-1}+\epsilon _{t}
\end{equation*}
\begin{equation*}
E(\epsilon _{t})=0
\end{equation*}
\begin{equation*}
E(\epsilon _{t}^{2})=\sigma _{\epsilon }^{2}
\end{equation*}
\begin{equation*}
E(\epsilon _{t}\epsilon _{s})=0\text{ (for }t\neq s) \end{equation*}
The Cochrane-Orcutt estimation procedure consists of the following steps:

Step 1: Estimate equation 12.20 by OLS. This results in unbiased and consistent estimates of the parameters [latex]\beta _0,\beta _1,\ldots ,\beta _k[/latex]. Use these estimated parameters to construct estimates of the sample residuals, [latex]\hat{u}_t[/latex], according to the relationship: \begin{equation}
\hat{u}_t=Y_t-\hat{\beta}_0-\hat{\beta}_1X_{1t}-\hat{\beta}_2X_{2t}-\ldots \hat{\beta}_kX_{kt} \tag{12.21}
\end{equation}

Step 2: Lag the estimated error term [latex]\hat{u}_t[/latex] to create a new variable equal to [latex]\hat{u}_{t-1}[/latex]. Use an OLS procedure to estimate the parameters of a regression equation of the form:
\begin{equation}
\hat{u}_t=\rho \hat{u}_{t-1}+\epsilon _t \tag{12.22} \end{equation}
(Note that this equation does not include a constant term.) The estimated coefficient, [latex]\hat{\rho}[/latex], serves as a preliminary estimate of the first-order autocorrelation coefficient [latex]\rho[/latex]. Since the lagged error term is not defined for the first observation, the sample used to estimate equation 12.22 consists of observations 2 through [latex]N[/latex].^[8]

Step 3: Use the estimated value of [latex]\rho[/latex] from the previous step to quasi-difference the dependent and independent variables, and estimate the quasi-differenced equation:
\begin{equation}
\widetilde{Y}_t=\widetilde{\beta }_0+\beta _1\widetilde{X}_{1t}+\beta _2 \widetilde{X}_{2t}+\ldots +\beta_k\widetilde{X}_{kt}+\epsilon _t \tag{12.23}
\end{equation}
where the variables in this equation are defined as:

\begin{equation*}\widetilde{Y}_t=Y_t-\hat{\rho}Y_{t-1} \end{equation*}
\begin{equation*}\widetilde{\beta }_o=\beta _o(1-\hat{\rho}) \end{equation*}
\begin{equation*}\widetilde{X}_{1t}=X_{1t}-\hat{\rho}X_{1t-1} \end{equation*}
\begin{equation*}\widetilde{X}_{2t}=X_{2t}-\hat{\rho}X_{2t-1} \end{equation*}
\begin{equation*}\vdots \end{equation*}
\begin{equation*}\widetilde{X}_{kt}=X_{kt}-\hat{\rho}X_{kt-1}\end{equation*}

Once again, this equation is estimated using only [latex]N-1[/latex] observations (since the first observation is lost due to the quasi-differencing). The estimated parameters derived at this stage are consistent estimates of the population parameters. Furthermore, the estimated standard errors computed as part of this regression procedure are appropriate for testing hypotheses involving intercept or slope parameters. While the estimation process may stop at this stage, this estimator is often iterated (as described in Steps 4 and 5 below).

Step 4: Use the new estimates of the intercept and slope parameters [latex]\hat{\beta}_0 \hat{\beta}_1,\ldots ,\hat{\beta}_k[/latex] to construct estimated residuals (as in Step 1) and estimate the parameters of the equation: \begin{equation*}
\hat{u}_t=\rho \hat{u}_{t-1}+\epsilon _t
\end{equation*}

Step 5: Use the new estimate of [latex]\rho[/latex] to quasi-difference the dependent and independent variables to form a new version of equation 12.23. Use an OLS estimation procedure to estimate the parameters [latex]\beta _0,\beta _1,\ldots ,\beta _k[/latex]. If these new estimates differ from the estimates at the previous stage by less than the desired level of accuracy, then stop the estimation procedure at this stage. If the change in the parameter values exceeds the desired level of accuracy, return to Step 4 and continue this process until the estimates converge.

While the Cochrane-Orcutt procedure is still used by many econometricians, it suffers from a serious shortcoming: the estimation process excludes the information contained in the first observation. Roughly speaking, some of the information contained in the sample is ignored when this procedure is used. Since many time-series models are estimated using a relatively small number of observations, this loss of data is somewhat troubling. For this reason, a number of other estimators have been devised that use all of the observations in the sample. One such estimator is the GLS estimator developed by Prais and Winsten. Let’s examine this estimator.

Prais-Winsten estimator

The Prais-Winsten estimation procedure is similar to the Cochrane-Orcutt procedure.^[9] The only difference is that the Prais-Winsten procedure includes all [latex]N[/latex] observations. Since it is not possible to quasi-difference the first observation, the Prais-Winsten procedure rescales the first observation so that the variance of the transformed residual for this observation is equal to the variance of the white-noise error process [latex]\epsilon _t[/latex]. Let’s examine how this is accomplished.
Since the error term [latex]u_{t}[/latex] is defined as:
\begin{equation*}
u_{t}=\rho u_{t-1}+\epsilon _{t}
\end{equation*}
the variance of [latex]u_{t}[/latex] can be expressed as:
\begin{equation*}
\sigma _{u}^{2}=E(u_{t}^{2})
\end{equation*}
\begin{equation*}
=E\left[ (\rho u_{t-1}+\epsilon _{t})^{2}\right]
\end{equation*}
\begin{equation*}
=E(\rho ^{2}u_{t-1}^{2}+\epsilon _{t}^{2}+2\rho u_{t-1}\epsilon _{t}) \end{equation*}
\begin{equation*}
=\rho ^{2}E(u_{t-1}^{2})+E(\epsilon _{t}^{2})+2\rho E(u_{t-1}\epsilon _{t}) \end{equation*}
Since [latex]\epsilon _{t}[/latex] is assumed to be independent of [latex]u_{t-1}[/latex], this relationship can be expressed as:
\begin{equation*}
\sigma _{u}^{2}=\rho ^{2}\sigma _{u}^{2}+\sigma _{\epsilon }^{2} \end{equation*}
Solving for the variance of [latex]\epsilon _{t}[/latex], this becomes: \begin{equation}
\sigma _{\epsilon }^{2}=(1-\rho ^{2})\sigma _{u}^{2} \tag{12.24} \end{equation}
Under the Prais-Winsten estimator, the following transformation is applied to the first observation:
\begin{equation*}
\tilde{Y}_1=\left( \sqrt{1-\hat{\rho}^2}\right) Y_1
\end{equation*}
\begin{equation*}
\tilde{X}_{11}=\left( \sqrt{1-\hat{\rho}^2}\right) X_{11} \end{equation*}
\begin{equation*}
\tilde{X}_{21}=\left( \sqrt{1-\hat{\rho}^2}\right) X_{21} \end{equation*}
\begin{equation*}
\vdots
\end{equation*}
\begin{equation*}
\tilde{X}_{k1}=\left( \sqrt{1-\hat{\rho}^2}\right) X_{k1} \end{equation*}
Thus, the transformed equation corresponding to the first observation can be expressed as:
For observation 1:
\begin{equation}
\left( \sqrt{1-\hat{\rho}^2}\right) Y_1=\beta _0\left( \sqrt{1-\hat{\rho}^2} \right) +\beta _1\left( \sqrt{1-\hat{\rho}^2}\right) X_{11}+\cdots +\beta _k\left( \sqrt{1-\hat{\rho}^2}\right) X_{k1} \tag{12.25} \end{equation}
\begin{equation*}
+\left( \sqrt{1-\hat{\rho}^2}\right) u_t
\end{equation*}
The variance of the error term for the first observation (equation 12.25) equals:
\begin{equation*}
var\left( \left( \sqrt{1-\hat{\rho}^2}\right) u_t\right) =(1-\rho ^2)var(u_t) \end{equation*}
\begin{equation*}
=(1-\rho ^2)\sigma _u^2
\end{equation*}
Using the result from equation 12.24, the variance of this transformed error term reduces to:
\begin{equation*}
var\left( \left( \sqrt{1-\hat{\rho}^2}\right) u_t\right) =\sigma _\epsilon ^2 \end{equation*}
The transformations for observations 2 through [latex]N[/latex] are the quasi-differences used in the Cochrane-Orcutt procedure. Thus, the transformed equations can be expressed as:
For observations 2 through [latex]N[/latex]:
\begin{equation}
\left( Y_t-\hat{\rho}Y_{t-1}\right) =\beta _0(1-\hat{\rho})+\beta _1(X_{1t}-% \hat{\rho}X_{1t-1})+\ldots +\beta _k(X_{kt}-\hat{\rho}X_{kt-1}) \tag{12.26}
\end{equation}
\begin{equation*}
+\left( u_t-\hat{\rho}u_{t-1}\right)
\end{equation*}
Since the residuals for observations 2 through [latex]N[/latex] converge to [latex]\epsilon _t[/latex] as the size of the sample approaches infinity, the variance of these terms tends to [latex]\sigma _\epsilon ^2[/latex]. Thus, the Prais-Winsten estimator transforms the residuals so that the variance is constant (as the size of the sample approaches infinity).

Since the Prais-Winsten transformations results in transformed residuals that satisfy the assumptions of the classical regression model, OLS estimation techniques may be applied to the transformed observations defined above. Under this procedure, the estimated intercept and slope parameters serve as consistent estimates of the population parameters. The standard errors generated by the Prais-Winsten estimator may be used to formulate [latex]t[/latex]-tests involving the individual parameter estimates.

As this discussion suggests, the Prais-Winsten estimator is an alternative GLS estimator that is quite similar to the Cochrane-Orcutt estimator. The difference between these two estimators is that the Prais-Winsten estimator includes information from all sample observations. In consequence, the Prais-Winsten estimator is demonstrably more efficient than the Cochrane-Orcutt estimator.^[10] For this reason, most econometricians prefer the Prais-Winsten estimator to the Cochrane-Orcutt estimator. The Prais-Winsten estimator is available in many modern econometrics software packages. (In some cases, it is simply referred to as a GLS correction procedure.)

Hildreth-Lu estimation procedure

The Hildreth-Lu estimation procedure uses a grid search procedure to estimate the value of [latex]\rho[/latex].T^[11] This procedure is implemented by searching over alternative values of [latex]\rho[/latex] to find the value that minimizes the sum of squared residuals. The Hildreth-Lu procedure involves the following steps:

Step 1: Select an initial value of [latex]\rho[/latex] (for example, [latex]\rho =0.999[/latex]).

Step 2: Use the current value of [latex]\rho[/latex] to quasi-difference the regression equation. The Prais-Winsten transformation described above may be used to transform the first observation. Estimate the parameters of the transformed equation using an OLS procedure. Record the residual sum of squares (RSS) for this regression.

Step 3: Reduce the value of [latex]\rho[/latex] by a fixed amount (such as .01) and go to Step 2. Continue this process until [latex]\rho[/latex] approaches -1.

Step 4: Select the value of [latex]\rho[/latex] that minimizes the residual sum of squares.

A common way of implementing the Hildreth-Lu procedure is to initially use a broad step size (such as .01 or .005) and then to redo the procedure in the region around the initial minimum using a smaller step size (such as .001 or .0005). This procedure can be continued with progressively smaller step sizes until the value of [latex]\rho[/latex] is estimated to the desired level of accuracy. Another common practice is to use the Hildreth-Lu procedure to select a starting value of [latex]\rho[/latex] for the Prais-Winsten or Cochrane-Orcutt procedures.

The main advantage of the Hildreth-Lu procedure over the Cochrane-Orcutt and Prais-Winsten estimators is that the Hildreth-Lu procedure helps to ensure that the procedure has converged to a global rather than a local minimum sum of squared residuals. The main disadvantage of the Hildreth-Lu procedure is that it requires a large amount of computation to arrive at a final value of [latex]\rho[/latex]. In general, this procedure requires the use of substantially more computer time than either the Cochrane-Orcutt or Prais-Winsten procedures.

For this reason, many econometricians prefer to use the Hildreth-Lu procedure to determine the approximate value of [latex]\rho[/latex] and then rely on the Prais-Winsten procedure to achieve final convergence.

Maximum likelihood estimator^[12]

Most econometric software packages allow the user to correct for the presence of first-order autocorrelation using one of the above procedures. Some of these software packages also provide a maximum likelihood estimator for AR(1) error processes. A full discussion of maximum likelihood estimators requires the use of mathematical tools that are beyond the scope of this text.^[13] For now, it can simply be noted that the likelihood function provides the probability (or likelihood) of observing a given combination of the dependent and independent variables as a function of the unknown model parameters ([latex]\rho ,\beta _0, \beta _1, \ldots ,\beta _k[/latex]). A maximum likelihood estimation procedure involves finding the values of the model parameters that maximize the probability of observing the particular sample values for the dependent and independent variables.

One desirable feature of maximum likelihood estimators is that they are asymptotically efficient. This property indicates that as the sample size tends toward infinity, no other unbiased estimators achieve a lower variance than that attained by maximum likelihood estimators.

12.1.6 Example: Consumption function

Let’s reconsider the consumption function defined as:^[14]
\begin{equation}
\text{C}_{t}=\beta _{0}+\beta _{1}\text{YD}_{t}+\beta _{2}\text{W}_{t}+\beta _{3}\text{Int}_{t}+u_{t} \tag{12.27}
\end{equation}
where:
C[latex]_{t}[/latex] = real consumption expenditures in year [latex]t[/latex]
YD[latex]_{t}[/latex] = real personal disposable income in year [latex]t[/latex]
W[latex]_{t}[/latex] = real value of private wealth in year [latex]t[/latex]
Int[latex]_{t}[/latex] = real yield on 3-month Treasury securities in year [latex]t[/latex]

The parameters of equation 12.27 were estimated by an OLS regression technique using annual data for the years 1947-2000. The estimated equation is:^[15]

\begin{equation}
\text{\^{C}}_{t}=\underset{(-1.61)}{-20.63}+\underset{(53.38)}{0.73}\text{YD} _{t}+\underset{(14.49)}{0.036}\text{W}_{t}\underset{(-2.39)}{-5.52}\text{Int} _{t} \tag{12.28}
\end{equation}

([latex]t[/latex]-statistics in parentheses)

Durbin-Watson statistic = 1.31

Figure 12.6 contains a plot of the sample residuals from this equation. An examination of this graph suggests that positive first-order autocorrelation may be present. (Notice that positive errors tend to be followed by other positive errors while negative errors tend to be followed by other negative errors.) The Durbin-Watson test can be used to provide a formal test of this hypothesis.

Insert Figure 12.6

An examination of the Durbin-Watson table appearing in Table 1.7 (in Appendix A at the end of this text) indicates that the critical values for the Durbin-Watson statistics are approximately [latex]d_{L}=1.444[/latex] and [latex]d_{U}=1.678[/latex] at a 5% significance level (since [latex]N=54[/latex] and [latex]k=3[/latex]).^[16] As noted above, the hypothesis of no autocorrelation should be rejected if the estimated Durbin-Watson statistic is less than 1.444. If the estimated Durbin-Watson statistic exceeds 1.678, then this hypothesis would not be rejected at a 5% significance level. A Durbin-Watson statistic between 1.444 and 1.678 falls in the uncertainty region and this test is inconclusive. Since the estimated Durbin-Watson statistic (1.31) is less than [latex]d_{L}[/latex], the null hypothesis of no first-order autocorrelation is rejected. Since there is evidence indicating that first-order autocorrelation is present, the [latex]t[/latex]-statistics appearing in equation 12.27 may be biased. OLS estimators are also inefficient in this case. Thus, some form of correction procedure is appropriate.

To alleviate these problems, each of the corrective procedures discussed above are used to estimate the parameters of equation 12.27. The resulting equations are:

Estimation Method	Estimated Equation (t-ratios in parentheses)	Estimated value of [latex]\rho[/latex]
Cochrane-Orcutt	[latex]\text{\^{C}}_{t}=\underset{(-1.29)}{-30.49}+\underset{(33.73)}{0.75}\text{YD}_{t}+\underset{(8.37)}{0.033}\text{W}_{t}- \underset{(-0.21)}{0.72}\text{Int}_{t}[/latex]	[latex]\hat \rho = 0.49[/latex]
Prais Winsten	[latex]\text{\^{C}}_{t}=\underset{(-1.17)}{-22.86}+\underset{(37.67)}{0.74}\text{YD}_{t}+\underset{(9.83)}{0.034}\text{W}_{t}-\underset{(-1.27)}{3.32}\text{Int}_{t}[/latex]	[latex]\hat{\rho}=0.40[/latex]
Hildreth-Lu	[latex]\hat{C}_{t}=\underset{(-1.18)}{-22.62}+\underset{(38.30) }{0.74}\text{YD}_{t}+\underset{(10.04)}{0.035}\text{W}_{t}-\underset{(-1.31)} {3.42}\text{Int}_{t}[/latex]	[latex]\hat{\rho}=0.38[/latex]
Max. likelihood	[latex]\text{\^{C}}_{t}=\underset{(-1.20)}{-22.40}+ \underset{(38.96)}{0.74}\text{YD}_{t}+\underset{(10.25)}{0.035}\text{W}_{t}- \underset{(-1.36)}{3.52}\text{Int}_{t}[/latex]	[latex]\hat{\rho}=0.37[/latex]

An examination of these estimated equations suggests that, in this application, all four of the estimation procedures provide quite similar results. Note that the [latex]t[/latex]-ratios for the wealth and interest rate variables are less than those appearing under the OLS estimator. In particular, the real interest rate coefficient becomes insignificant at all conventional significance levels when autocorrelation is taken into account. Since the OLS estimator tends to overstate [latex]t[/latex]-ratios, these lower [latex]t[/latex]-ratios are not surprising. The increase in the [latex]t[/latex]-ratio for the disposable personal income variable is probably due to the increase in the efficiency of the estimators when autocorrelation is taken into account.

12.2 Higher-order autocorrelation

While a first-order autoregressive model seems to fit a wide variety of econometric models quite well, it is not necessarily the best specification in a given application. It is possible, for example, that the error terms may follow a more complicated autoregressive process. Suppose, for example, that the current error term is defined as : \begin{equation*}
u_t=\rho _1u_{t-1}+\rho _2u_{t-2}+\epsilon _t \end{equation*}
\begin{equation*}
\begin{array}{ll}
\text{where:} & E(\epsilon _t)=0 \\
& E(\epsilon _t^2)=\sigma _\epsilon ^2 \\ & E(\epsilon _t\epsilon _s)=0\text{ for }t\neq s% \end{array}%
\end{equation*}
An error process satisfying these conditions is called a \textbf{% second-order autoregressive process}. (This is also commonly expressed as an AR(2) error process).
More generally, an error process may follow a $p$th-order autoregressive process (AR($p$)). In this case, the error term is defined as: \begin{equation}
u_{t}=\rho _{1}u_{t-1}+\rho _{2}u_{t-2}+\cdots +\rho _{p}u_{p}+\epsilon _{t} \label{AR(p).ac}
\end{equation}%
The consequences of higher-order autocorrelation are equivalent to those of first-order autocorrelation:
\begin{itemize}
\item OLS estimates of the intercept and slope parameters are unbiased and consistent, but not efficient, and
\item the usual OLS standard errors and $t$-ratios are biased and cannot be used for hypothesis tests.
\end{itemize}
\subsection{Detection of higher-order autocorrelation\label{B_G_test_sec}} There are two commonly used tests for the presence of higher-order autocorrelation: the Breusch-Godfrey and the Box-Pierce tests. Each of these tests is based upon the correlations that exist between current and past sample residuals. Let’s discuss each of these tests.
\subsubsection{Breusch-Godfrey test}
Suppose the regression model is given by: \begin{equation}
Y_t=\beta _o+\beta _1X_{1t}+\beta _2X_{2t}+\cdots +\beta _kX_{kt}+u_t \label{orig.reg.ac}
\end{equation}
It is suspected that the error terms may follow a $p$th-order autoregressive process. The Breusch-Godfrey test involves an application of the Lagrange multiplier test introduced in Chapter \ref{spec.chap}:\footnote{% See the discussion in Breusch (1978) and Godfrey (1978a). For a good summary of this procedure, see Greene (2000), pp. 540-2.} \begin{enumerate}
\item[Step 1:] Estimate the parameters of the original regression model (equation \ref{orig.reg.ac}) by OLS and save the estimated residuals $\hat{u}% _{t}$.
\item[Step 2:] Use an OLS\ estimation procedure to estimate the parameters of the equation:
\begin{equation}
\hat{u}_t=\alpha _1X_{1t}+\alpha _2X_{2t}+\ldots +\alpha _kX_{kt}+\gamma _1% \hat{u}_{t-1} \label{b-g.test.ac}
\end{equation}
\begin{equation*}
+\gamma _2\hat{u}_{t-2}+\ldots +\gamma _p\hat{u}_{t-p}+v_t \end{equation*}
where $v_t$ is a random error term. Missing observations on the lagged error terms should be replaced with zero (the expected value of the lagged error term). This makes it possible to include more sample information in the estimation process.
\item[Step 3:] Use the estimated $R^{2}$ from the regression in Step 2 to formulate the statistic:
\begin{equation}
\text{Breusch-Godfrey statistic = }NR^{2} \label{B-G.stat} \end{equation}%
Under a null hypothesis of no autocorrelation, this statistic is distributed as a $\chi ^{2}$ statistic with $p$ degrees of freedom. If the value of this statistic exceeds the critical value at the predetermined significance level, then it can be concluded that the error terms follow an autoregressive process with a length that is less than or equal to $p$.
\end{enumerate}
An inspection of equation \ref{b-g.test.ac} indicates how the Breusch-Godfrey test operates. The sample covariance between the sample error terms and each of the original independent variables ($X_1,X_2,\ldots ,X_k$) equals zero in equation \ref{b-g.test.ac}. (This is required as part of the OLS estimation procedure). Thus, the $X_i$ variables contribute no explanatory power to this regression. A large $R^2$ indicates that past error terms exert a substantial influence over current errors (controlling for the effect of the independent variables $X_i$). If current errors are not affected by the magnitude of past errors, $R^2$ should be close to zero.
\subsubsection{Box-Pierce and Ljung-Box tests} A somewhat simpler test is provided by the Box-Pierce $Q$ statistic.% \footnote{%
Box and Pierce (1970).} This statistic is defined as: \begin{equation*}
Q=N\sum_{i=1}^{p}\hat{\rho}_{i}^{2}
\end{equation*}%
where $\hat{\rho}_{i}$ is the sample correlation between $\hat{u}_{t}$ and $% \hat{u}_{t-i}$:
\begin{equation*}
\hat{\rho}_{i}=\frac{\sum_{t=i+1}^{N}\hat{u}_{t}\hat{u}_{t-i}}{\sum_{t=1}^{N}% \hat{u}_{t}^{2}}
\end{equation*}
Under the null hypothesis of no autocorrelation in the first $p$ lags, the Box-Pierce $Q$ statistic is distributed as a $\chi ^2$ statistic with $p$ degrees of freedom. If the estimated Box-Pierce statistic exceeds the critical value at the preselected significance level, then it can be concluded that there is significant autocorrelation in the first $p$ lags.
An alternative version of the $Q$ test statistic has been proposed by Ljung and Box (1978). This revised statistic is defined as: \begin{equation*}
Q^{\prime }=N(N+2)\sum_{i=1}^{p}\left( \frac{\hat{\rho}_{i}^{2}}{N-i}\right) \end{equation*}%
This statistic is also asymptotically distributed as a $\chi ^{2}$ distribution with $p$ degrees of freedom. (Note that the two estimators converge as $N$ tends toward infinity.) Most econometricians prefer the Ljung-Box statistic to the original Box-Pierce statistic since the Ljung-Box statistic appears to perform better in small samples. Many econometrics packages provide an option that computes the sample autocorrelations for the estimated residuals. This makes it very easy to compute the value of either the Box-Pierce or the Ljung-Box statistic.
\subsubsection{Length of autoregressive process} One problem associated with the use of the Breusch-Godfrey, Box-Pierce, and Ljung-Box tests is that the lag length for the autoregressive process is not generally known \textit{a priori}. A simple procedure is to estimate several versions of equation \ref{b-g.test.ac} using different values of $p$ and choose the lag length for which adjusted $R^{2}$ is maximized.\footnote{% Other methods of determining the appropriate lag length in an autoregressive model are discussed in more detail in Chapter \ref{ARIMA.chap}.} \subsection{Correction for AR(p) errors}
If an econometrician believes that the error process in a regression model is a $p$th-order autoregressive process, the correction process is quite similar to that described above. While a GLS estimator can be constructed that uses all $N$ observations, this becomes a bit more complex as the length of the autoregressive process increases. Let’s discuss the somewhat simpler Cochrane-Orcutt estimator.
\begin{enumerate}
\item[Step 1:] Estimate the original regression equation by OLS and store the residuals $\hat{u}_t$.
\item[Step 2:] Regress $\hat{u}_t$ on the lagged variables $\hat{u}_{t-1},% \hat{u}_{t-2},\ldots ,\hat{u}_{t-p}$ to construct estimates of the parameters in equation \ref{AR(p).ac}$.$ (Do not include a constant term in this regression.)
\item[Step 3:] For observations $p+1$ to $N$, create the quasi-differenced variables:
\begin{equation*}
\tilde{Y}_{t}=Y_{t}-\hat{\rho}_{1}Y_{t-1}-\hat{\rho}_{2}Y_{t-2}-\cdots -\hat{% \rho}_{p}Y_{t-p}
\end{equation*}%
\begin{equation*}
\tilde{X}_{1t}=X_{1t}-\hat{\rho}_{1}X_{1t-1}-\hat{\rho}_{2}X_{1t-2}-\cdots -% \hat{\rho}_{p}X_{1t-p}
\end{equation*}%
\begin{equation*}
\tilde{X}_{2t}=X_{2t}-\hat{\rho}_{1}X_{2t-1}-\hat{\rho}_{2}X_{2t-2}-\cdots -% \hat{\rho}_{p}X_{2t-p}
\end{equation*}%
\begin{equation*}
\vdots
\end{equation*}%
\begin{equation*}
\tilde{X}_{kt}=X_{1kt}-\hat{\rho}_{1}X_{kt-1}-\hat{\rho}_{2}X_{kt-2}-\cdots -% \hat{\rho}_{p}X_{kt-p}
\end{equation*}
\item[Step 4:] Use an OLS estimation procedure to estimate the parameters of the regression:
\begin{equation*}
\tilde{Y}_{t}=\tilde{\beta}_{o}+\beta _{1}\tilde{X}_{1t}+\beta _{2}\tilde{X}% _{2t}+\cdots +\beta _{k}\tilde{X}_{kt}+\epsilon _{t} \end{equation*}%
\begin{equation*}
\text{where: }\tilde{\beta}_{o}=\left( 1-\hat{\rho}_{1}-\hat{\rho}% _{2}-\cdots -\hat{\rho}_{p}\right) \beta _{o} \end{equation*}%
Under the Cochrane-Orcutt procedure the first $p$ observations must be excluded from the sample used to estimate the parameters of this equation due to the lagged variables used in the variable transformations described above.
\item[Step 5:] This procedure may be iterated (as in the Cochrane-Orcutt estimator for an AR(1) model). In this case, Steps 2-4 are repeated until the parameter estimates converge to the desired level of accuracy.
\end{enumerate}
One of the main difficulties with the use of an AR($p$) model is that there is no commonly accepted procedure for determining the length of the lag structure.\footnote{%
A commonly used method for determining the lag length of autoregressive process involves using the autocorrelation and partial autocorrelation functions for the error terms. This method is discussed in section \ref% {ident_section_arima} of Chapter \ref{ARIMA.chap}.} Fortunately, however, an AR(1) model seems to describe a very large proportion of residual processes in time-series regression models. In rare cases, an AR(2) model is required to transform the residual series so that the transformed residuals are approximately equal to a white-noise error process.\footnote{% It should also be noted that there are a variety of alternative specifications that can be used to model a time-series residual process. One such model is called a moving average process. When the residuals follow a moving average process, the current value of the error term is assumed to be a function of current and past random shocks. For example, a first-order moving average model (MA(1)) may be specified as: \begin{equation*}
u_{t}=\epsilon _{t}-\theta \epsilon _{t-1} \end{equation*}%
\begin{equation*}
\text{where: }\epsilon _{t}\text{ is a white-noise error process} \end{equation*}%
It is also possible that the error terms may follow an autoregressive moving average (ARMA) process. A model of this sort involves a combination of the autoregressive and moving average models. Chapter \ref{ARIMA.chap} contains a brief discussion of techniques that can be used to determine which of these models are most appropriate.}
\subsection{Example: Investment demand equation} An example should help to illustrate the process of estimating a regression model in which the error term exhibits a higher-order autoregressive process. Consider the following simple model of an investment demand equation:%
\begin{equation}
\text{I}_{t}=\beta _{o}+\beta _{1}\text{GDP}_{t}+\beta _{2}\Delta \text{GDP}% _{t}+\beta _{3}\text{int}_{t}+u_{t} \label{inv_demand_ac} \end{equation}%
\begin{equation*}
\begin{array}{cl}
\text{where:} & \text{I}_{t}=\text{ U.S. gross private domestic investment in year }t \\
& \text{GDP}_{t}=\text{ U.S. gross domestic product in year }t \\ & \Delta \text{GDP}_{t}\text{ }=\text{ GDP}_{t}\text{ – GDP}_{t-1} \\ & \text{int}_{t}=\text{ real interest rate} \\ & u_{t}=\text{ random error term}%
\end{array}%
\end{equation*}
When the parameters of equation \ref{inv_demand_ac} are estimated using annual time series data from 1930 through 2001,\footnote{% The data used to estimate this equation are contained in the file \textquotedblleft invest.dat.\textquotedblright\ A description of this file appears in Table \ref{invest.dat} on p. \pageref{invest.dat}.} the following equation results:%
\begin{equation}
\text{\^{I}}_{t}=\underset{(-7.781)}{-135.06}+\underset{(31.141)}{0.165}% \text{GDP}_{t}+\underset{(2.463)}{0.256}\Delta \text{GDP}_{t}+\underset{% (0.950)}{2.293}\text{int}_{t} \label{est_inv_demand_eq_ac} \end{equation}%
\begin{equation*}
R^{2}=0.967
\end{equation*}%
\begin{equation*}
\text{(}t\text{-statistics in parentheses)} \end{equation*}%
\begin{equation*}
\text{Durbin-Watson statistic = 0.3124}
\end{equation*}%
Since there are 72 observations and 3 slope parameters, the lower and upper bounds for the Durbin-Watson statistic are approximately 1.528 and 1.704% \footnote{%
Linear interpolation is used since only values for N = 70 and N=75 are available in the Durbin-Watson table.} The null hypothesis of no first-order autocorrelation can be rejected since the Durbin-Watson statistic falls well within the rejection region of this statistic. This indicates the presence of at least first-order autocorrelation. Let’s determine whether there is any evidence of higher-order autocorrelation using Breusch-Godfrey and Ljung-Box statistics. To construct the Breusch-Godfrey statistic, it is necessary to estimate equations of the form:% \begin{equation}
\hat{u}_{t}=a_{o}+a_{1}\text{GDP}_{t}+a_{2}\Delta \text{GDP}_{t}+a_{3}\text{% int}_{t} \label{Breusch_Godfrey_invest}
\end{equation}%
\begin{equation*}
+\sum\limits_{i=1}^{p}\gamma _{i}\hat{u}_{t-i}+\epsilon _{t} \end{equation*}
To construct estimated Breusch-Godfrey statistics, equation \ref% {Breusch_Godfrey_invest} was estimated using lag lengths of 1 through 5. The estimated parameters from these 5 equations (along with the $R^{2}$ and adjusted $R^{2}$ values for these equations) appear in Table \ref% {Breusch_Godfrey.eq1.ac}. As discussed in section \ref{B_G_test_sec}, the Breusch-Godfrey statistic is given by the value of $NR^{2}$ (the next to last row in Table \ref{Breusch_Godfrey.eq1.ac}). \ At a 1\% significance level, the relevant critical values of this statistic range from 6.6349 (for $p$=1) to 15.0863 (for $p$=5). Since the estimated Breusch-Godfrey test statistic exceeds these critical values in each case, the null hypothesis of no autocorrelation in the first $p$ lags can be rejected for $\ $each of these five values of $p$.
\begin{center}
%TCIMACRO{\TeXButton{B}{\begin{table}[tbp] \centering}}% %BeginExpansion
\begin{table}[tbp] \centering%
%EndExpansion
\begin{tabular}{l|c|c|c|c|c|}
& \multicolumn{5}{c|}{\textbf{Estimated Coefficients}} \\ \multicolumn{1}{|l|}{\textbf{Parameter}} & (\textbf{{\boldmath$p$} = 1)} & (% \textbf{{\boldmath$p$} = 2)} & (\textbf{{\boldmath$p$} = 3)} & (\textbf{{% \boldmath$p$} = 4)} & (\textbf{{\boldmath$p$} = 5)} \\ \hline\hline \multicolumn{1}{||c|}{$\widehat{a}_{o}$} & -6.560 & -3.260 & -1.600 & -2.385 & \multicolumn{1}{|c||}{-1.774} \\ \hline \multicolumn{1}{||c|}{$\widehat{a}_{1}$} & 0.001 & -0.001 & -0.002 & -0.002 & \multicolumn{1}{|c||}{-0.002} \\ \hline \multicolumn{1}{||c|}{$\widehat{a}_{2}$} & 0.026 & 0.057 & 0.060 & 0.058 & \multicolumn{1}{|c||}{0.057} \\ \hline
\multicolumn{1}{||c|}{$\widehat{a}_{3}$} & 0.460 & 0.809 & 0.868 & 0.892 & \multicolumn{1}{|c||}{0.847} \\ \hline
\multicolumn{1}{||c|}{$\widehat{\gamma }_{1}$} & 0.874 & 1.109 & 1.088 & 1.089 & \multicolumn{1}{|c||}{1.090} \\ \hline \multicolumn{1}{||c|}{$\widehat{\gamma }_{2}$} & – & -0.321 & -0.213 & -0.204 & \multicolumn{1}{|c||}{-0.207} \\ \hline \multicolumn{1}{||c|}{$\widehat{\gamma }_{3}$} & – & – & -0.132 & -0.167 & \multicolumn{1}{|c||}{-0.171} \\ \hline
\multicolumn{1}{||c|}{$\widehat{\gamma }_{4}$} & – & – & – & 0.454 & \multicolumn{1}{|c||}{0.068} \\ \hline
\multicolumn{1}{||c|}{$\widehat{\gamma }_{5}$} & – & – & – & – & \multicolumn{1}{|c||}{-0.030} \\ \hline
\multicolumn{1}{||c|}{$R^{2}$} & 0.6998 & 0.7211 & 0.7246 & 0.7250 & \multicolumn{1}{|c||}{0.7251} \\ \hline
\multicolumn{1}{||c|}{$NR^{2}$} & 50.389 & 51.923 & 52.174 & 52.201 & \multicolumn{1}{|c||}{52.212} \\ \hline
\multicolumn{1}{||c|}{$\overline{R}^{2}$} & 0.6819 & 0.7000 & 0.6992 & 0.6949 & \multicolumn{1}{|c||}{0.6903} \\ \hline\hline \end{tabular}%
\caption{Estimates ofthe parameters of equation \ref{Breusch_Godfrey_invest}.
\label{Breusch_Godfrey.eq1.ac}}%
%TCIMACRO{\TeXButton{E}{\end{table}}}%
%BeginExpansion
\end{table}%
%EndExpansion
\end{center}
The estimated autocorrelations of the residuals in equation \ref% {inv_demand_ac} are also used to compute Ljung-Box $Q^{\prime }$ statistics (given the small sample size, this is preferable to the use of the Box-Pierce statistic). The estimated autocorrelations and Ljung-Box $% Q^{\prime }$ statistics for these residuals appear in Table \ref% {invest.err.acf}. Since the Ljung-Box $Q^{\prime }$ statistic follows a $% \chi ^{2}$ distribution with $p$ degrees of freedom, the critical values (as in the case of the Breusch-Godfrey test above) all fall between between 6.6349 (for $p$=1) and 15.0863 (for $p$=5) at a 1\% significance level.
Since the estimated Ljung-Box $Q^{\prime }$ statistics exceed the corresponding critical values for each value of $p$, the null hypotheses of no autocorrelation in lags 1 through $p$ (for each value of $p$ between 1 and 5) can be rejected by this test.
This, both the Breusch-Godfrey and the Ljung-Box tests indicate that some form of an autoregressive process is present. These test do not, however, indicate the length of this process. A test indicating significant autocorrelation among the first $p$ lags of the error terms merely indicates that one or more of the $u_{t-i}$ (for $i=1,\ldots ,p)$ terms has a significant effect on the current value of $u_{t}$. As noted in section \ref% {B_G_test_sec}, one strategy for selecting the length of the autoregressive process is to use the lag length for which the adjusted $R^{2}$ from the estimated versions of equation \ref{Breusch_Godfrey_invest}\ is largest. \ As indicated in Table \ref{Breusch_Godfrey.eq1.ac}, the adjusted $R^{2}$ is maximized at a lag length of 2.
\begin{center}
\bigskip
%TCIMACRO{\TeXButton{B}{\begin{table}[tbp] \centering}}% %BeginExpansion
\begin{table}[tbp] \centering%
%EndExpansion
\begin{tabular}{||l|c|c||}
\hline\hline
& & \textbf{Ljung-Box} \\
\textbf{Lag (\textit{p})} & $\hat{\rho}_{i}$ & $Q^{\prime }$ \textbf{% statistic} \\ \hline\hline
\multicolumn{1}{||c|}{1} & 0.801 & 48.138 \\ \hline \multicolumn{1}{||c|}{2} & 0.522 & 68.868 \\ \hline \multicolumn{1}{||c|}{3} & 0.273 & 74.628 \\ \hline \multicolumn{1}{||c|}{4} & 0.103 & 75.466 \\ \hline \multicolumn{1}{||c|}{5} & 0.008 & 75.472 \\ \hline\hline \end{tabular}%
\caption{Autocorrelations for error term in investment demand equation\label{invest.err.acf}}%
%TCIMACRO{\TeXButton{E}{\end{table}}}%
%BeginExpansion
\end{table}%
%EndExpansion
\bigskip
\end{center}
If the error term follows a 2nd-order autoregressive process, equation \ref% {inv_demand_ac}\ can be expressed as::
\begin{equation}
\text{I}_{t}=\beta _{o}+\beta _{1}\text{GDP}_{t}+\beta _{2}\Delta \text{GDP}% _{t}+\beta _{3}\text{int}_{t}+\rho _{1}u_{t-1}+\rho _{2}u_{t-2}+\epsilon _{t} \label{inv_demand_ac1}
\end{equation}%
The application of a Cochrane-Orcutt estimation procedure to equation \ref% {inv_demand_ac1} results in:%
\begin{equation*}
\text{\^{I}}_{t}=\underset{(-3.202)}{-162.009}+\underset{(15.449)}{0.176}% \text{GDP}_{t}+\underset{(4.274)}{0.197}\Delta \text{GDP}_{t}-\underset{% (-0.433)}{0.973}\text{int}_{t}+\underset{(8.563)}{1.125}u_{t-1}-\underset{% (-2.226)}{0.324}u_{t-2}
\end{equation*}%
\begin{equation*}
\text{(}t\text{-statistics in parentheses)} \end{equation*}%
\begin{equation*}
\overline{R}^{2}=0.99
\end{equation*}%
Notice that the estimated $t$-ratios for this equation are lower than those that were estimated before a correction was made for the presence of autocorrelation (these estimates appear in equation \ref% {est_inv_demand_eq_ac}). When the error terms in a regression follow an autoregressive process, the estimated $t$-ratios are generally inflated when an OLS estimation procedure is used that does not take autocorrelation into account.
\section{Lagged dependent variable as regressor\label{lagged.dep.ac}} Let’s consider the effect of including a lagged dependent variable as a regressor. Suppose a regression equation is given by: \begin{equation}
Y_t=\beta _o+\beta _1Y_{t-1}+\beta _2X_t+u_t \label{lag.dep.ac} \end{equation}
\begin{equation*}
\text{where: }u_t=\rho u_{t-1}+\epsilon _t \end{equation*}
Note that both $Y_{t-1}$ and $u_t$ are determined, in part, by the level of $% u_{t-1}$. When $u_{t-1}$ is relatively large, both $Y_{t-1}$ and $u_t$ will tend to be relatively large. Negative values of $u_{t-1}$ will tend to result in relatively small values of $Y_{t-1}$ and $u_t$. In consequence, the variable $Y_{t-1}$ and $u_t$ are correlated. As noted in Chapter \ref% {spec.chap}, an OLS estimation procedure results in biased parameter estimates when the error term is correlated with one of the independent variables in a regression equation. Thus, an attempt to estimate the parameters of equation \ref{lag.dep.ac} by OLS will result in biased estimates of the population parameters. Since the Durbin-Watson statistic is derived from the residuals from this regression, the Durbin-Watson statistic is also biased. In particular, the Durbin-Watson statistic is biased towards two. Thus, the Durbin-Watson test will tend to suggest the presence of no autocorrelation when first-order autocorrelation is present. Roughly speaking, the reason for this bias is that the estimated coefficient on $% Y_{t-1}$ will partially pick up the effect of the autocorrelation in the residuals (since $Y_{t-1}$ is partly determined by $u_{t-1}$). The sample residuals will therefore tend to be uncorrelated.
\subsection{Detection of autocorrelation} \subsubsection{Durbin’s $h$ test}
To test for the presence of first-order autocorrelation when a lagged dependent variable is included as a regressor, Durbin (1970) suggested the following procedure:
\begin{enumerate}
\item[Step 1:] Estimate the parameters of equation \ref{lag.dep.ac} using an OLS\ estimation procedure.
\item[Step 2:] Estimate $\rho $ by using the approximation $\hat{\rho}% =1-\frac d2$ (derived from equation \ref{d-w.a.ac}), where $d$ is the Durbin-Watson statistic.
\item[Step 3:] Formulate the statistic:
\begin{equation*}
h=\hat{\rho}\sqrt{\frac N{1-N\hat{\sigma}_{\hat{\beta}_1}^2}} \end{equation*}
or
\begin{equation*}
h=\left( 1-\frac d2\right) \sqrt{\frac N{1-N\hat{\sigma}_{\hat{\beta}_1}^2}} \end{equation*}
\begin{equation*}
\begin{array}{ll}
\text{where: } & \hat{\sigma}_{\hat{\beta}_1}^2\text{ is the squared value of the standard error of }\hat{\beta}_1\text{ in equation \ref{lag.dep.ac}} \\
& N\text{ = sample size}%
\end{array}%
\end{equation*}
Under the null hypothesis of no first-order autocorrelation, Durbin’s $h$ statistic approaches a standard normal distribution as the size of the sample approaches infinity.\footnote{%
While this result holds only in large samples, this test procedure is often used even when the sample size is relatively small.} Thus, the null hypothesis should be rejected if the $h$ statistic exceeds the critical value for a standard normal density function at the predetermined significance level. If a 5\% significance level is selected, the null hypothesis should be rejected if and only if the absolute value of the $h$ statistic exceeds 1.96.
\end{enumerate}
One problem with Durbin’s procedure is that it breaks down when $N\hat{\sigma% }_{\hat{\beta}_1}>1$ (since the test statistic would involve the square root of a negative number). When this occurs, the Lagrange multiplier test discussed below may be used.
\subsubsection{Lagrange multiplier test}
The Lagrange multiplier test introduced in Chapter \ref{spec.chap} may be used to provide a test for the presence of autocorrelation when one or more lagged values of the dependent variable are included as regressors.\footnote{% A good discussion of this test appears in Harvey (1991) pp. 276-277).} Let’s consider the simple model discussed above: \begin{equation}
Y_t=\beta _o+\beta _1Y_{t-1}+\beta _2X_t+u_t \label{lag.dep.aca} \end{equation}
\begin{equation*}
\text{where: }u_t=\rho u_{t-1}+\epsilon _t \end{equation*}
A Lagrange multiplier test consists of the following procedure: \begin{enumerate}
\item[Step 1:] Estimate the parameters of equation \ref{lag.dep.aca} using an OLS estimation procedure. Save the residuals ($\hat{u}_t$) from this regression.
\item[Step 2:] Enter the estimated residuals from the first-stage regression as a dependent variable in the auxiliary regression equation: \begin{equation}
\hat{u}_t=\gamma _o+\gamma _1Y_{t-1}+\gamma _2X_t+\gamma _3\hat{u}% _{t-1}+\epsilon _t \label{lm.res.ac}
\end{equation}
Estimate the parameters of this equation by OLS.
\item[Step 3:] Formulate the Lagrange multiplier statistic defined as: \begin{equation*}
\text{Lagrange multiplier statistic = }(N-1)R^{2} \end{equation*}%
(This version of the Lagrange multiplier test uses ($N-1$) as the number of observations since one observation is lost due to the use of lagged values in the regression equation.) Under the null hypothesis of no autocorrelation, this Lagrange multiplier statistic follows a $\chi ^{2}$ distribution with one degree of freedom.
\item[Step 4:] Reject the null hypothesis of no autocorrelation if the Lagrange multiplier statistic exceeds the critical value for a $\chi ^2$ with one degree of freedom.\footnote{%
Durbin (1970) has also proposed an alternative test that relies on a $t$% -test on the coefficient for the lagged residual term in this equation.} \end{enumerate}
One advantage of this procedure is that it can be easily generalized to models that include several lags of the dependent variable and higher-order autoregressive processes.\footnote{%
If additional lags of the dependent variable are included as regressors, equations \ref{lag.dep.aca} and \ref{lm.res.ac} should be modified to include these variables as additional regressors. To test for higher-order autoregressive processes, simply include additional lags of the estimated residual as independent variables in equation \ref{lm.res.ac}. If a $p$% th-order autoregressive process is specified, the Lagrange multiplier statistic, $\left( N-p\right) R^2$, follows a $\chi ^2$ distribution with $p$ degrees of freedom (where $\left( N-p\right) $ is the number of observations used to estimate the auxiliary regression equation).} \subsection{Correction for autocorrelation} Suppose that one of the two tests described above results in a finding of significant first-order autocorrelation in a model with a lagged dependent variable as a regressor. In this case, a fairly simple estimator may be used to provide consistent (and asymptotically efficient) parameter estimates.% \footnote{%
This estimator was developed by Hatanaka (1974). Good discussions of this procedure appear in Harvey (1991), p. 271, and Greene (2000), pp. 550-2.} Once again, the basic model is given by:
\begin{equation}
Y_t=\beta _o+\beta _1Y_{t-1}+\beta _2X_t+u_t \label{lag.dep.vars} \end{equation}
\begin{equation*}
u_t=\rho u_{t-1}+\epsilon _t
\end{equation*}
This estimator can be constructed using the following procedure:\footnote{% The econometric software package, LIMDEP, contains an automated version of Hatanaka’s estimator. See Greene (1995), pp. 281-2, for a discussion of the implementation of this procedure.}
\begin{enumerate}
\item[Step 1:] Estimate a first-stage regression of the dependent variable ($% Y_t$) on current and lagged values of each exogenous variable. For the regression model specified in \ref{lag.dep.vars}, this first-stage regression equation becomes:
\begin{equation}
Y_t=\gamma _o+\gamma _1X_t+\gamma _2X_{t-1}+v_t \label{lag.dep.vars1} \end{equation}
\item[Step 2:] Use equation \ref{lag.dep.vars1} to generate fitted values of the dependent variable, $Y_{t}$:
\begin{equation}
\hat{Y}_{t}=\hat{\gamma}_{o}+\hat{\gamma}_{1}X_{t}+\hat{\gamma}_{2}X_{t-1} \label{lag.dep.vars2}
\end{equation}%
and lag this variable one period to form a fitted value of $\hat{Y}_{t-1}$.
\item[Step 3:] Use an OLS regression procedure to estimate the parameters of the following equation:
\begin{equation}
Y_t=\beta _o+\beta _1\hat{Y}_{t-1}+\beta _2X_t+u_t \label{y.iv.hat} \end{equation}
This equation is a modified form of the original equation in which the lagged dependent variable has been replaced with the fitted value ($\hat{Y}% _{t-1}$) from Step 2. Note that this procedure is essentially just an instrumental variables (IV) estimator in which current and lagged values of $% X_t$ are used as instruments for $Y_t.$ As noted earlier in Chapter \ref% {spec.chap}, the instrumental variables estimator will provide consistent estimates of the parameters of the original equation.
\item[Step 4:] Use the estimated parameters from equation \ref{y.iv.hat} to generate consistent estimates of the residuals: \begin{equation*}
\hat{u}_{t}=Y_{t}-\hat{\beta}_{o}-\hat{\beta}_{1}Y_{t-1}-\hat{\beta}_{2}X_{t} \end{equation*}%
(Note that the actual value of $Y_{t-1}$ is used in this equation, not the fitted value from Step 2.)
\item[Step 5:] Use the residuals from Step 4 to generate the regression equation:
\begin{equation*}
\hat{u}_{t}=\rho ^{\ast }\hat{u}_{t-1}+\epsilon _{t} \end{equation*}%
Estimate the parameters of this equation by OLS to provide an estimated value of $\rho ^{\ast }$. Call this estimated value $\hat{\rho}^{\ast }$ \item[Step 6:] Use the estimated value of $\hat{\rho}^{\ast }$ from Step~5 to quasi-difference the dependent and independent variables in equation~\ref% {lag.dep.vars}. Run a regression of the quasi-differenced dependent variable against the quasi-differenced right-hand side variables and $\hat{u}_{t-1}$.
This equation is given by:
\begin{equation*}
\tilde{Y}_{t}=\alpha _{o}+\beta _{1}\tilde{Y}_{t-1}+\beta _{2}\tilde{X}% _{t}+\rho ^{\ast \ast }\hat{u}_{t-1}+w_{t} \end{equation*}%
\begin{equation*}
\begin{array}{ll}
\text{where:} & \tilde{Y}_{t}=Y_{t}-\hat{\rho}^{\ast }Y_{t-1} \\ & \tilde{Y}_{t-1}=Y_{t-1}-\hat{\rho}^{\ast }Y_{t-2} \\ & \tilde{X}_{t}=X_{t}-\hat{\rho}^{\ast }X_{t-1} \\ & w_{t}\text{ = random error term}%
\end{array}%
\end{equation*}%
Define $\hat{\rho}^{\ast \ast }$ as the estimated value of the coefficient $% \rho ^{\ast \ast }$ in this equation. The estimated coefficients $\hat{\beta}% _{1}$ and $\hat{\beta}_{2}$ serve as consistent estimates of the corresponding population parameters in equation~\ref{lag.dep.vars}. The resulting estimates of the standard errors may be used to test hypotheses involving the intercept or slope parameters.
\item[Step 7:] Hatanaka (1974) has shown that a consistent estimate of $\rho $ is provided by:
\begin{equation*}
\hat{\rho}=\hat{\rho}^{\ast }+\hat{\rho}^{\ast \ast } \end{equation*}
\end{enumerate}
\subsection{Example: Consumption function revisited} Suppose that an econometrician specifies the following form for a consumption function:
\begin{equation}
\text{C}_t=\beta _o+\beta _1\text{C}_{t-1}+\beta _2\text{YD}_t+u_t \label{cons.lag.ac}
\end{equation}
\begin{equation*}
\begin{array}{ll}
\text{where:} & \text{C}_t\text{ = real consumption expenditures in year }t \\
& \text{YD}_t\text{ = real personal disposable income in year }t% \end{array}%
\end{equation*}
This form of the consumption function differs from earlier specifications by the inclusion of lagged consumption expenditures as an additional right-hand-side variable. Under this alternative specification, it is assumed that the current level of consumption expenditures is partially affected by the previous year’s consumption spending. This might, for example, be the result of consumers being affected by habitual patterns of behavior. If this is the case, consumer spending decisions would be affected by the past levels of consumption spending as well as by the level of disposable personal income.
The parameters of equation \ref{cons.lag.ac} were estimated using quarterly data for the period 1947:2 to 2001:4.\footnote{% The data used to generate these estimates appears in the file \textquotedblleft cons3.dat.\textquotedblright\ This data is described in Table \ref{cons3.dat} in Appendix \ref{data.appendix}. While data on consumption and personal disposable income is available from the first quarter of 1947, $C_{t-1}$ is only observed beginning in the second quarter of this year.} The estimated equation is: \begin{equation}
\text{\^{C}}_{t}=\underset{(3.397)}{0.0507}+\underset{(0.01564)}{1.011}\text{% C}_{t-1}+\underset{(0.01436)}{0.0019}\text{YD}_{t}+u_{t} \label{C-est-OLS} \end{equation}%
\begin{equation*}
\text{(standard errors in parentheses)}
\end{equation*}%
The Durbin-Watson statistic for this regression equals 1.6307. Using these estimates, the Durbin $h$ statistic can be formed as: \begin{equation*}
h=\left( 1-\frac{d}{2}\right) \sqrt{\frac{N}{1-N\hat{\sigma}_{\hat{\beta}% _{1}}^{2}}}
\end{equation*}%
\begin{equation*}
=\left( 1-\frac{1.6307}{2}\right) \sqrt{\frac{219}{1-219\left( 0.01564\right) ^{2}}}
\end{equation*}%
\begin{equation*}
=2.8088
\end{equation*}%
An examination of Table \ref{n-table} in Appendix \ref{stat.tab.app} indicates that the critical value of a standard normal variate at a 5\% significance level equals 1.96. Since the estimated $h$ statistic exceeds this critical value, Durbin’s $h$ statistic indicates the presence of first-order autocorrelation.
It will be useful to examine how a Lagrange multiplier test may also be used to test for the presence of autocorrelation. To implement this test, the estimated residuals from equation \ref{C-est-OLS} are used to estimate the equation:%
\begin{equation*}
\hat{u}_{t}=-\underset{(-0.177)}{0.600}-\underset{(-0.330)}{0.0051}\text{C}% _{t-1}+\underset{(0.333)}{0.0048}\text{YD}_{t}+\underset{(2.662)}{0.182}% \widehat{u}_{t-1}
\end{equation*}%
\begin{equation*}
\text{(}t\text{-statistics in parentheses)} \end{equation*}%
\begin{equation*}
R^{2}=0.03205
\end{equation*}%
In this case, the Lagrange multiplier statistic equals:% \begin{equation*}
(N-1)R^{2}=218(0.03205)=6.987
\end{equation*}%
At a significance level of 5\%, the critical value of a $\chi ^{2}$ statistic with one degree of freedom is 3.8415. Since the Lagrange multiplier statistic exceeds this critical value, the hypothesis of no autocorrelation may be rejected.
When Hatanaka’s procedure is implemented to estimate the parameters of equation \ref{cons.lag.ac}, the following equation resulted:\footnote{% These estimates were found using the Hatanaka estimation procedure in LIMDEP.% }
\begin{equation}
\text{\^{C}}_{t}=\underset{(18.742)}{-35.062}+\underset{(0.214)}{0.576}\text{% C}_{t-1}+\underset{(0.197)}{0.397}\text{YD}_{t}+u_{t} \label{C-est-hat} \end{equation}%
\begin{equation*}
\text{(}t\text{-statistics in parentheses)} \end{equation*}%
A comparison of the estimated coefficients and standard errors indicates that the use of Hatanaka’s correction for first-order autocorrelation resulted in a substantial impact on the estimates. As is often the case, the standard errors of the parameter estimates are substantially larger (and the $t$-ratios are smaller) when the correction for autocorrelation is made.
There was also a substantial change in the magnitude of the coefficients on C% $_{t-1}$ and YD$_{t}$.\footnote{%
The change in the parameter estimates may also be due to the substantial multicollinearity that exists among the variables and the instruments.} \section{Autocorrelation in panel data models} A full discussion of autocorrelation is panel data models is beyond the scope of this text. A simple model, however, may be used to illustrate how the methods discussed earlier in this chapter may be applied in a panel data model. Consider a relatively simple model of an earnings equation for which $% T$ years of data are available for each of $N$ individuals.\footnote{% A model in which observations are available in each year for a given cross-section \ of individuals is called a \textquotedblleft balanced panel\textquotedblright\ model (an unbalanced panel model does not have the same number of observations for each individual).} A simple version of this model may be specified as:%
\begin{equation*}
\text{Earnings}_{it}=\text{ }\beta _{o}+\beta _{1}\text{Years of Ed}% _{it}+\beta _{2}\text{Experience}_{it}+u_{it} \end{equation*}%
for $i=1$, $N$ and $t=1,T$.%
\begin{equation*}
\begin{array}{ll}
\text{where: } & \text{Earnings}_{it}=\text{ earnings of person }i\text{ in year }t \\
& \text{Years of Ed}_{it}\text{ = years of education for person }i\text{ in year }t \\
& \text{Experience}_{it}\text{ = years of work experience for person }i\text{ in year }t \\
& u_{it}\text{ = error term for person }i\text{ in year }t% \end{array}%
\end{equation*}%
Note that, under this specification, there are a total of $N\times T$ observations.
To test of the presence of first-order autocorrelation in this model, a variation of the Durbin-Watson statistic may be computed as:\footnote{% Note that the simpler Durbin-Watson statistic discussed above is a special case of this equation that corresponds to the case in which $N=1$.}% \begin{equation*}
\text{Durbin-Watson statistic = }\frac{\dsum\limits_{i=1}^{N}\dsum% \limits_{t=2}^{T}\left( \hat{u}_{it}-\hat{u}_{i,t-1}\right) ^{2}}{% \dsum\limits_{i=1}^{N}\dsum\limits_{t=1}^{T}\hat{u}_{it}^{2}} \end{equation*}%
Many modern econometric software packages can automatically generate estimates of this Durbin-Watson statistic when panel data models are estimated.
As in the time-series models discussed earlier in this chapter, estimators that have not taken the presence of autocorrelation into account will provide biased and inconsistent estimates of the standard errors of the estimators. Therefore, $t$-ratios generated by such models are not appropriate for hypothesis testing.
A relatively simple correction for autocorrelated errors relies on a variation of the Cochrane-Orcutt method discussed earlier in this chapter.
Several modern econometrics packages provide panel-data estimators that correct for the presence of autocorrelation in the error process. A full discussion of the available estimators is beyond the scope of this text.% \footnote{%
Interested readers may find a good description of such estimators in Greene (2000), pp. 581-2 or Baltagi (1995).}
\section{Summary}
In this chapter, the effects of autocorrelation have been examined. In particular, the presence of autocorrelation causes OLS estimators to be inefficient. Furthermore, OLS standard errors are biased when this condition occurs. Several methods of detecting alternative forms of autocorrelation have been examined. Appropriate corrective methods have been presented for each type of autocorrelation.
Before leaving this chapter, you should be sure that you: \begin{itemize}
\item understand the effects of autocorrelation; \item know how to test for this violation of the classical regression model; \item understand the merits of alternative corrective procedures; and \item are able to implement appropriate corrective techniques.
\end{itemize}
\section{Key Concepts}
autocorrelation
1st-order autocorrelation
Durbin-Watson test
Durbin-Watson statistic
quasi-differencing
unit root
random walk
differencing
generalized least-squares (GLS) estimation Cochrane-Orcutt procedure
Prais-Winsten estimator
Hildreth-Lu procedure
$p$th-order autoregressive process (AR($p$)) Breusch-Godfrey test
Box-Pierce statistic
Ljung-Box statistic
Durbin’s $h$ test
Hatanaka’s estimator
\newpage\
\section{Exercises and problems}
\begin{enumerate}
\item Suppose that an econometrician wishes to estimate an investment demand function using annual time-series data. Explain why the error terms may exhibit first-order autocorrelation.
\item Why is autocorrelation not generally a problem in cross-sectional models?
\item An econometrician estimates the parameters of an earnings equation using longitudinal data. An OLS regression is performed and a Durbin-Watson statistic is estimated that does not take the longitudinal nature of the data into account. This Durbin-Watson statistic suggests that first-order autocorrelation is present. Is this necessarily the result of autocorrelation? Explain.

\item Equation \ref{sse_bias_ar1} indicates that the sum of squared sample residuals will be greater than or equal to the sum of squared population residuals in a bivariate regression model when there is positive first-order autocorrelation and the independent variable rises over time. Under these conditions, what must be true for the sum of squared sample residuals to equal the sum of squared population residuals?
\item Consider the regression model:
\begin{equation*}
Y_{t}=\beta _{o}+\beta _{1}\text{time}_{t}+\beta _{2}X_{t}+u_{t} \end{equation*}%
Suppose that the error term in this equation follows a random walk process defined as:
\begin{equation*}
u_{t}=u_{t-1}+\epsilon _{t}
\end{equation*}%
\begin{equation*}
\text{where: }\epsilon _{t}\text{ is a white noise error process} \end{equation*}
\begin{enumerate}
\item Explain how the parameters of this equation may be estimated by differencing the original model.
\item Is there a constant term in the differenced model? If so, what is the interpretation of this term? If not, explain why not.
\end{enumerate}
\item The data in the file \textquotedblleft invest.dat\textquotedblright\ (described in Table \ref{invest.dat} in Appendix \ref{data.appendix}) contains information that can be used to estimate the parameters of an investment demand function for the U.S. economy of the form: \begin{equation}
\text{Investment}_{t}=\beta _{o}+\beta _{1}\text{interest}_{t}+\beta _{2}% \text{GDP}_{t}+u_{t} \label{inv.dem.ac} \end{equation}%
\begin{equation*}
\begin{array}{ll}
\text{where:} & \text{Investment}_{t}\text{ = real investment spending in year }t \\
& \text{interest}_{t}\text{ = real interest rate in year }t \\ & \text{GDP}_{t}\text{ = real GDP in year }t% \end{array}%
\end{equation*}
\begin{enumerate}
\item Estimate the parameters of this equation using OLS.
\item Use a Durbin-Watson test to investigate the possibility of first-order autocorrelation.
\item Correct for the presence of autocorrelation using a GLS estimation procedure.
\end{enumerate}
\item Durbin and Watson (1951) used annual data for the years 1870 to 1938 to estimate the parameters of the following demand relationship: \begin{equation}
\text{log}(\text{QD}_{t})\text{ = }\beta _{o}+\beta _{1}\text{log}(\text{% Income}_{t})+\beta _{2}\text{log}(\text{Price}_{t}) \label{dw.spirits} \end{equation}%
\begin{equation*}
\begin{array}{ll}
\text{where:} & \text{QD}_{t}\text{ = per capita consumption of spirits in year }t \\
& \text{Income}_{t}\text{ = per capita income in year }t \\ & \text{Price}_{t}\text{ = relative price of spirits in year }t\text{ ( = price of spirits / price index)}%
\end{array}%
\end{equation*}%
Each of these variables is measured as an index in which 1900 is the base year (\textit{i.e., }each of the variables is transformed by dividing each observation by the value of the observation in 1900 and multiplying by 100).
\begin{enumerate}
\item Use the data in the file \textquotedblleft spirits.dat” (described in Table \ref{spirits.dat} in Appendix \ref{data.appendix}) to estimate the parameters of equation \ref{dw.spirits}. (Note that the original variables are already expressed in log form so no further log transformation is needed.)
\item Perform a Durbin-Watson test (at a 5\% significance level) to determine the possibility of first-order autocorrelation.
\item If first-order autocorrelation is found, apply an appropriate corrective technique. Use the results from this estimation to compute the estimated income and price elasticities of demand.
\item A single-equation estimation technique is appropriate only if all of the right-hand side variables are either exogenous or predetermined. Is it likely that this condition is satisfied in this case? (Methods of estimating models in which this condition is not satisfied are discussed in Chapter \ref% {simul.chap}.)
\end{enumerate}
\item Gujarati (1968) estimates the following relationship: \begin{equation}
\ln (\text{HWI}_t)=\beta _o+\beta _1\ln (\text{UN}_t) \label{lnhwi.ar} \end{equation}
using 24 quarterly observations on the help-wanted index (HWI$_t$) and the unemployment rate (UN$_t$).
\begin{enumerate}
\item Use the data in the file \textquotedblleft hwi.dat\textquotedblright\ (described in Table \ref{hwi.dat} in Appendix \ref{data.appendix}) to estimate the parameters of equation \ref{lnhwi.ar}.
\item Use a Durbin-Watson test to investigate the possibility of first-order autocorrelation. If first-order autocorrelation is found, correct for its presence using an appropriate corrective technique.
\end{enumerate}
\item Brada and Graves (1988) investigate the determinants of Soviet defense expenditures. One of the equations estimated in this study is given by: \begin{equation}
\ln (\text{SDL}_t)=\beta _o+\beta _1\ln (\text{USD}_t\text{) + }\beta _2\ln (% \text{SGNP}_t)+\beta _3\ln (\text{SP}_t)+u_t \label{soviet.def} \end{equation}
\begin{equation*}
\begin{array}{ll}
\text{where:} & \text{SDL}_t\text{ = CIA low estimate of Soviet defense expenditures in year }t \\
& \text{USD}_t\text{ = real U.S. defense expenditures in year }t \\ & \text{SGNP}_t\text{ = real Soviet GNP in year }t \\ & \text{SP}_t\text{ = strategic parity measure = (Soviet warheads / U.S.
warheads)}%
\end{array}%
\end{equation*}
\begin{enumerate}
\item Can you predict a sign for each slope coefficient in equation \ref% {soviet.def}? If so, state and explain your predictions. As part of your answer, explain why each variable might be included in this equation.
\item Use the data in the file \textquotedblleft sdef.dat\textquotedblright\ (described in Table \ref{sdef.dat} in Appendix \ref{data.appendix}) to estimate the parameters of equation \ref{soviet.def}. (Note: you must create the variable SP$_{t}$ from the SW$_{t}$ and USW$_{t}$ variables.) At a 5\% significance level, which parameters are statistically significant?
\item Conduct a Durbin-Watson test to investigate the possibility of first-order autocorrelation. As part of your response, state the rejection region (and uncertainty region) for the Durbin-Watson statistic at a 5\% significance level. What does this test suggest?
\item If first-order autocorrelation is found (or the Durbin-Watson statistic lies in the uncertainty region), use an appropriate correction technique and generate corrected estimates of equation \ref{soviet.def}.
Compare the $t$-values with those appearing in (b). Does the interpretation of the results change?
\end{enumerate}
\item Answer all parts of the previous question using ln(SDH$_t$) instead of ln(SDL$_t$) as the dependent variable (where SDH$_t$ = the CIA high estimate of Soviet defense expenditures).
\item Consider the investment demand curve appearing in equation \ref% {inv.dem.ac}.
\begin{enumerate}
\item Use an econometric software package to verify the results of the Breusch-Godfrey tests reported in Table \ref{Breusch_Godfrey.eq1.ac}. (The data may be found in the file \textquotedblleft invest.dat.\textquotedblright )
\item Use the Ljung-Box statistic to test for the presence of an AR(2)\ error process. Do your results match those appearing in the text?
\item Use the Cochrane-Orcutt estimation procedure to estimate the parameters of equation \ref{inv.dem.ac} under the assumption that the errors follow an AR(2) error process. Are your results consistent with those in the text? (The data for this model is in the file \textquotedblleft invest.dat\textquotedblright\ and is described in Table \ref{invest.dat} on p. \pageref{invest.dat}.)
\end{enumerate}
\item An econometrician estimates the parameters of the model: \begin{equation*}
Y_{t}=\beta _{o}+\beta _{1}Y_{t-1}+\beta _{2}X_{t}+\beta _{3}Z_{t}+u_{t} \end{equation*}%
and finds that the Durbin-Watson statistic equals 1.95. Is it safe to assume that no first-order autocorrelation is present? Explain.
\item Benjamin Klein (1974) used annual time-series data for the period 1880 to 1970 to investigate the long-run demand for money. One of the equations estimated in this study is:%
\begin{equation*}
\ln (\text{M2}_{t})=-\underset{(-82.50)}{14.09}+\underset{(52.44)}{1.372}\ln (\text{YP}_{t})-\underset{(-13.28)}{0.285}\text{RA}_{t}-\underset{(-4.06)}{% 0.058}\text{RL}_{t}+\underset{(12.09)}{0.303}\text{RM}_{t} \end{equation*}%
\begin{equation*}
(t\text{-statistics in parentheses})
\end{equation*}%
\begin{equation*}
R^{2}=0.990
\end{equation*}%
\begin{equation*}
\text{Durbin-Watson statistics = 0.94} \end{equation*}%
\begin{equation*}
\begin{array}{cl}
\text{where:} & \text{M2}_{t}=\text{M2 monetary aggregate in year }t \\ & \text{YP}_{t}=\text{ permanent income in year }t \\ & \text{RS}_{t}=\text{ short-term commercial paper rate in year }t \\ & \text{RL}_{t}=\text{ yield on long-term corporate bonds in year }t \\ & \text{RM}_{t}=\text{ weighted average of the return on currency and the return on deposits in year }t%
\end{array}%
\end{equation*}%
Is there evidence of first-order autocorrelation? If so, how does this affect the interpretation of these results?
\item In a study of cotton production during the years 1830 to 1860, Gavin Wright estimates the following equation:% \begin{equation*}
\text{S}_{t}=1.72+2.10\text{PNO}_{t}+0.017\text{t} \end{equation*}%
\begin{equation*}
(\text{standard errors in parentheses}) \end{equation*}%
\begin{equation*}
R^{2}=0.230
\end{equation*}%
\begin{equation*}
\text{Durbin-Watson statistic = 0.7}
\end{equation*}%
\begin{equation*}
\begin{array}{ll}
\text{where:} & \text{S}_{t}\text{ = annual land sales in the American South in year }t \\
& \text{PNO}_{t}\text{ = New Orleans price of cotton in year }t \\ & \text{t}=\text{ time}%
\end{array}%
\end{equation*}%
Is there evidence of first-order autocorrelation in this equation? If so, how does this affect the interpretation of these results?
\item Explain why the Prais-Winsten estimator is more efficient than the Cochrane-Orcutt estimator.
\item
\begin{enumerate}
\item Estimate the parameters of equation \ref{cons.lag.ac} using the annual data on $C_{t}$ and $YD_{t}$ contained in the file \textquotedblleft cons.dat\textquotedblright\ (this data is described in Table \ref% {cons.dat.app} in Appendix \ref{data.appendix}) .
\item Use Durbin’s $h$ test to test for the presence of first-order autocorrelation. Do your results agree with those found when quarterly data was used? If the outcome is different, what might account for the difference?
\item Use a Lagrange multiplier test to investigate the possibility of first-order autocorrelation in this case. Do the results of this test agree with those found using Durbin’s $h$ statistic?
\end{enumerate}
\item In a classic study, Hamilton (1972) examined the effects of cigarette advertising and health warnings on the demand for cigarettes. The Public Health \ Cigarette Smoking Act of 1970 prohibited TV and radio advertising for cigarettes. A side effect of this Act was the elimination of the equal (and free) broadcast time that was provided for antismoking advertisements under the Federal Communications Commission’s Fairness Doctrine. Hamilton’s analysis provided strong evidence suggesting that the effect of the antismoking ads on cigarette demand was larger than the effect resulting from advertising. This suggested that the ban on advertising was expected to increase the demand for cigarettes while lowering the costs of cigarette manufacturers. As part of this study, Hamilton used annual data from 1953 to 1970 to estimate the following equation:% \begin{equation*}
\text{Cigarettes}_{t}=-101.9+\underset{(10.42)}{1.004}\text{Cigarettes}% _{t-1}+\underset{(0.47)}{0.736}\text{Advertising}_{t} \end{equation*}%
\begin{equation*}
-\underset{(2.73)}{255.9}\text{D}_{t}
\end{equation*}%
\begin{equation*}
\begin{array}{cl}
\text{where:} & \text{Cigarettes}_{t}=\text{ \textit{per capita} cigarette consumption in year }t \\
& \text{Advertising}_{t}\text{ = index of \textit{per capita} cigarette advertising expenditures in year }t\text{ } \\ & \text{D}_{t}\text{ = health scare dummy variable for years 1964-1970}% \end{array}%
\end{equation*}%
\begin{equation*}
(t\text{-statistics in parentheses})
\end{equation*}%
\begin{equation*}
R^{2}=0.97
\end{equation*}%
\begin{equation*}
\text{Durbin-Watson statistic}=1.91
\end{equation*}
\begin{enumerate}
\item Should the Durbin-Watson statistic for this equation be used to test for the presence of first-order autocorrelation in the error terms for this equation?
\item If so, does this statistic indicate the presence of first-order autocorrelation? If not, what test or tests would be more appropriate?
\end{enumerate}
\item (More difficult question) Use the data in the file \textquotedblleft cons2.dat\textquotedblright\ \ (a description of the data in this file appears on p.\pageref{cons2.dat.app} in Table \ref{cons2.dat.app}) to verify the results reported in equation \ref{C-est-hat}.\newpage\ \end{enumerate}

In a higher-order autoregressive process the current error term is assumed to be a function of a larger number of past error terms. ↵
More precisely, negative first-order autocorrelation occurs when outcomes that exceed the conditional expectation of the dependent variable tend to be followed by outcomes that fall below the conditional expectation of the dependent variable. ↵
This result only holds if no lagged values of the dependent variable appear as regressors. If lagged values of the dependent variable are present, the OLS estimators are biased and inconsistent when autocorrelation is present (since a nonzero correlation exists between the lagged dependent variable and the error term in this case). ↵
A formal proof of these propositions requires mathematical tools beyond the scope of this text. The interested reader may find a proof in virtually any advanced text. See, for example, Greene (2000), p. 537. ↵
This test was initially discussed in Durbin and Watson (1950, 1951). ↵
A time series variable, is said to be ``differenced'' when it is transformed so that it measures the change in the level of the series. Quasi-differencing is a generalization of this differencing transformation. ↵
ee Cochrane and Orcutt (1949) for the original statement of this procedure. ↵
An initial estimate of [latex]\hat{\rho}[/latex] can also be generated from equation 12.3. Since: \begin{equation*} d\approx 2(1-\hat{\rho}) \end{equation*} an approximate value of [latex]\hat{\rho}[/latex] can be computed as: \begin{equation*} \hat{\rho}\approx 1-\frac d2 \end{equation*} ↵
This procedure is discussed in Prais and Winsten (1954). A good treatment (requiring matrix algebra) appears in Greene (2000), pp. 546-9. ↵
For a discussion of these results see Maeshiro (1979), Poirer (1978), or Park and Mitchell (1980). The inclusion of the first observation is particularly important when there is a trend in the series and the length of the time series is relatively short. ↵
his procedure was initially presented in Hildreth and Lu (1960). ↵
This section can be skipped without loss of continuity. ↵
A full discussion of maximum likelihood estimators may be found in more advanced texts. See, for example, Greene (2000), pp. 547-9 ↵
C[latex]_{t}[/latex], YD[latex]_{t}[/latex], and W[latex]_{t}[/latex] are expressed in billions of chained 1996 dollars. Int[latex]_{t}[/latex] is expressed as a percentage. ↵
The data used for this analysis is in the file ``cons2.dat.'' All variables (other than the interest rate) are measured in billions of 1987 dollars. ↵
Since the value of the Durbin-Watson statistic is only presented for [latex]N = 59[/latex] and [latex]N = 55[/latex], linear interpolation is used to compute these values. Some econometricians, however, feel uncomfortable with the use of a linear approximation to a nonlinear relationship and argue that it would be better to use the values from the Durbin-Watson table corresponding to [latex]N = 55[/latex]. When the observed value of [latex]N[/latex] falls between two values in the table, selecting the higher value of [latex]N[/latex] makes it more difficult to reject the possibility of finding no autocorrelation. ↵

License

Icon for the Creative Commons Attribution 4.0 International License