Chapter 15 – Systems of Equations
\chapter{Systems of Equations\label{simul.chap}}
In previous chapters, econometric models consisted of a single equation in which each of the right-hand side variables is assumed to be exogenous. As discussed in Chapter \ref{intro.chap}, a variable is exogenous if it is determined outside of the model. Many economic models, however, contain more than one endogenous variable. It is very common for feedback relationships to exist among the endogenous variables in a model. Macroeconomic models often consist of numerous equations that explain the relationships among the endogenous and exogenous variables. Even the simplest model of demand and supply requires at least two equations (a demand equation and a supply equation) and contains two endogenous variables (price and quantity). In simultaneous equation models, some of the right-hand side variables in one or more equations are endogenous.
In this chapter, it will be shown that OLS estimation techniques will result in biased and inconsistent estimates when there are one or more endogenous variables on the right-hand side of an equation. The cause of this bias is examined and alternative correction techniques are presented.
\section{Simultaneous-equation models}
In the single equation models that were discussed in previous chapters, it was assumed that there was unidirectional causality from the independent variables appearing on the right-hand side of the equation to the dependent variable appearing on the left-hand side of the equation. The independent variables in the equation were all assumed to be exogenous (determined outside of the model). In a simultaneous-equations model, however, one or more of the right-hand side variables not only affects the level of the dependent variable, but is also affected by the level of the dependent variable.
An examination of any introductory economic textbook suggests that even the simplest economic models consist of at least two endogenous variables. It is very common for there to be bidirectional causality among at least some of these variables. Let’s examine two examples that are discussed in virtually all introductory economics texts.
\subsection{Keynesian consumption function}
Consider the simple Keynesian consumption function:
\begin{equation} \label{cons.func.sc}
C_t=\beta _o+\beta _1YD_t+u_t
\end{equation}
In previous chapters, it was assumed that disposable income ($YD_t$) was exogenous. This assumption, however, is somewhat tenuous. Since a change in consumption spending affects equilibrium GDP, it will also affect disposable personal income. Thus, disposable income not only affects current consumption spending, it is also affected by consumption spending.
To see this relationship, it is helpful to consider a simple version of the Keynesian model:
\begin{equation}
C_{t}=\beta _{o}+\beta _{1}YD_{t}+u_{t} \label{keynes.lc} \end{equation}%
\begin{equation}
Y_{t}=C_{t}+I_{t}+G_{t}+NX_{t} \label{keynes.lca}
\end{equation}%
\begin{equation}
YD_{t}=Y_{t}-Tn_{t} \label{keynes.lcb}
\end{equation}%
\begin{equation*}
\begin{array}{ll}
\text{where:} & Y_{t}\text{ = national income and output in period }t \\ & YD_{t}\text{ = disposable personal income} \\
& C_{t}\text{ = consumption expenditure } \\
& I_{t}\text{ = investment spending (assumed to be exogenous)} \\ & G_{t}\text{ = government spending (assumed to be exogenous)} \\ & NX_{t}\text{ = net exports (assumed to be exogenous)} \\ & Tn_{t}\text{ = net taxes = taxes – transfer payments (assumed to be exogenous) } \\
& u_{t}\text{ = random error term}%
\end{array}%
\end{equation*}%
Equation \ref{keynes.lca} is an equilibrium condition that states that the equilibrium level of national income must equal the planned spending on final goods and services by consumers ($C_{t}$), firms ($I_{t}$), the government ($G_{t}$) and the net spending in the foreign sector ($NX_{t}$).
This equilibrium condition indicates that changes in consumption spending will affect the equilibrium level of national income ($Y_{t}$). An inspection of equation \ref{keynes.lcb} indicates that this change in national income will affect the level of disposable income ($YD_{t}$).
This discussion suggests that it is inappropriate to consider the Keynesian consumption function in isolation. Disposable income is a primary determinant of the level of consumption spending. Consumption spending, however, is also an important determinant of the level of disposable income.
As will be demonstrated below, the use of an OLS estimation procedure is inappropriate when an endogenous variable (such as $YD_t$) appears as a regressor in an equation.
\subsection{Demand and supply\label{ds.sec.sc}}
Suppose that an econometrician wishes to estimate the parameters of the following demand curve:
\begin{equation}
\text{Demand relationship: }Q_{t}=\beta _{o}+\beta _{1}P_{t}+\beta _{2}X_{t}+u_{t} \label{demand.sc}
\end{equation}%
The variable $X_{t}$ is an exogenous variable that affects the position of the demand curve. (In a more complete model, several exogenous variables would be included in this equation.) In most applications, however, the price variable ($P_{t}$) in this model is endogenous (a few possible exceptions are considered below). The complete model also requires a supply equation such as:\footnote{%
In most economic applications, quantity supplied is generally the dependent variable in the supply equation. It is more convenient, in this case, to use the inverse supply function in which price is the dependent variable. This maintains consistency with the general form in which simultaneous models are often stated (as in the discussion that follows).}
\begin{equation}
\text{Supply relationship: }P_{t}=\gamma _{o}+\gamma _{1}Q_{t}+\gamma _{2}Z_{t}+v_{t} \label{supply.sc}
\end{equation}%
The variable $Z_{t}$ is an exogenous variable that affects the position of the supply curve. (A more complete specification would include several other variables including measures of resource costs, the prices of goods related in production, the number of suppliers, and similar variables).
The observed price and quantity combination is the result of the interaction of both demand and supply. When price and quantity data are collected, it is not generally possible to estimate either the demand or supply curve using OLS\ estimation techniques. (The reasons for this are discussed in Section % \ref{se.bias.sc}.)
\subsubsection{Demand and supply models with exogenous prices} There are a few situations, however, in which the price variable in a demand equation can be appropriately treated as being exogenous: \begin{itemize}
\item Firms engaging in market research often vary the price of a particular product across locations and/or across time and collect data on the quantity sold at each of these prices. In these studies, they generally ensure that enough of the good is available at each location to satisfy consumer demand.
Since these firms are not varying the price in response to the level of sales during the experiment, price may be properly viewed as an exogenous variable.
\item Price can generally be treated as exogenous when data is collected on the quantity of a good demanded by individual consumers (or households) who face different prices in different markets. The price facing each consumer is generally unaffected by the quantity purchased by the individual consumer (unless there are only a relatively small number of buyers). In the case of a perfectly competitive market, each individual consumer’s purchasing decisions has no effect on the price of the commodity.
\item In regulated markets (such as the markets for electricity and natural gas), firms are not able to vary price in response to changes in the quantity demanded. Since regulated utilities must satisfy the quantity demanded by consumers at the regulated price, the observed price-quantity combinations represent points on the demand curve and may be used to estimate the parameters of a demand equation.
\end{itemize}
Using a similar argument, price can be treated as an exogenous determinant of quantity supplied if surveys of sellers are used to elicit information about supply responses to possible price levels. In practice, however, economists tend to place more faith on observed responses to actual, rather than hypothetical, price changes.
In markets in which none of the above conditions apply, however, the price of a good should be treated as an endogenous variable. When the price of a good is an endogenous variable, the demand and supply model is expressed as a simultaneous equations model. Let’s examine the general features of such simultaneous equations models.
\section{Simultaneous equation bias\label{se.bias.sc}}
A simple demand and supply model can be used to illustrate the problem that occurs when an OLS\ estimation technique is applied to a single equation in a simultaneous equations model. Consider the following (simplified) demand and supply equations:
\begin{equation} \label{demand.1.sc}
\text{Demand relationship: }Q_t=\beta _o+\beta _1P_t+u_t
\end{equation}
and
\begin{equation} \label{supply.1.sc}
\text{Supply relationship: }P_t=\gamma _o+\gamma _1Q_t+v_t \end{equation}
Suppose that an econometrician wishes to estimate the parameters of the demand relationship appearing in equation \ref{demand.1.sc}. If an OLS estimation procedure is used with the observed data, is it likely that the estimated relationship will capture this demand curve?
\begin{center}
\FRAME{ftbpFU}{4.7833in}{3.3901in}{0pt}{\Qcb{Demand and supply model}}{\Qlb{% dands_g_sc}}{fig15-1.gif}{\special{language “Scientific Word”;type “GRAPHIC”;maintain-aspect-ratio TRUE;display “USEDEF”;valid_file “F”;width 4.7833in;height 3.3901in;depth 0pt;original-width 4.7288in;original-height 3.3434in;cropleft “0”;croptop “1”;cropright “1”;cropbottom “0”;filename ‘GRAPHS/Fig15-1.gif’;file-properties “XNPEU”;}}
\end{center}
Figure~\ref{dands_g_sc} sheds some light on this issue. This diagram contains three hypothetical demand and supply curves that correspond to three different time periods. For each time period, the intersection between the corresponding demand and supply curves for a particular time period provides one time-series observation on price and quantity.\footnote{% It is assumed that this market is in equilibrium. In algebraic terms, this implies that $Q^{D}$ = $Q^{S}$ (where $Q^{D}$ and $Q^{S}$ represent quantity demanded and supplied, respectively). Thus, $Q_{t}$ is used to represent the equilibrium quantity in both the supply and demand equations. While it is possible to estimate demand and supply models when markets are in a state of disequilibrium, an analysis of this topic is beyond the scope of this text.
The interested reader may find a good, though mathematically more sophisticated, discussion of disequilibrium models in Quandt (1988).} In this diagram, there are three observations on price and quantity. Suppose that the econometrician had 50 points, each corresponding to the equilibrium between demand and supply at a particular point in time. Would a curve fit to these points represent a demand curve? Or a supply curve? As this diagram suggests, it is quite likely that such a curve would represent neither the demand nor the supply equation.
The problem illustrated in Figure~\ref{dands_g_sc} may occur whenever the right-hand side of an estimated equation contains an endogenous variable.
When an OLS estimation procedure is used, the estimated coefficient on the right-hand side endogenous variable may reflect the causality from this variable to the dependent variable in this equation; it may also, however, partly represent the reverse causality (from the dependent variable to the right-hand side endogenous variable). Thus, the measured coefficient on the endogenous variable is subject to a potential bias when an OLS estimation procedure is used. This bias is referred to as \textbf{simultaneous equations bias}.
This bias can also be seen algebraically. It will be helpful to determine the equilibrium price and quantity. To find this equilibrium, it is necessary to find the combination of price and quantity that satisfies equations \ref{demand.1.sc} and \ref{supply.1.sc}. By simple substitution, equation \ref{demand.1.sc} may be restated as:
\begin{equation*}
Q_t=\beta _o+\beta _1\left( \gamma _o+\gamma _1Q_t+v_t\right) +u_t \end{equation*}
\begin{equation*}
Q_t\left( 1-\beta _1\gamma _1\right) =\beta _o+\beta _1\left( \gamma _o+v_t\right) +u_t
\end{equation*}
Thus, the equilibrium quantity will equal:
\begin{equation} \label{Q.eq.sc}
Q_t=\frac{\beta _o+\beta _1\gamma _o}{1-\beta _1\gamma _1}+ \frac{u_t+\beta _1v_t}{1-\beta _1\gamma _1}
\end{equation}
Substituting this result into the supply equation results in: \begin{equation*}
P_t=\gamma _o+\gamma _1\left[ \frac{\beta _o+\beta _1\gamma _o}{1-\beta _1\gamma _1}+\frac{u_t+\beta _1v_t}{1-\beta _1\gamma _1}\right] +v_t \end{equation*}
Using simple algebraic transformations, this becomes:
\begin{equation} \label{P.eq.sc}
P_t=\frac{\gamma _o+\gamma _1\beta _o}{1-\beta _1\gamma _1}+ \frac{\gamma _1u_t+v_t}{1-\beta _1\gamma _1}
\end{equation}
An examination of equations \ref{Q.eq.sc} and \ref{P.eq.sc} indicates that the error terms $u_{t}$ and $v_{t}$ both affect the equilibrium price and quantity. The reason for this should be rather obvious: changes in $u_{t}$ and $v_{t}$ result in shifts in the demand and supply curves, respectively.
A shift in the position of either the demand or the supply curve will alter the equilibrium price and quantity in this model.
Suppose that an econometrician attempted to use an OLS procedure to estimate the parameters of the demand curve given in equation \ref{demand.1.sc}. As equation \ref{P.eq.sc} demonstrates, the price variable appearing on the right-hand side of this equation is partly determined by the error term in the demand equation ($u_{t}$). In Chapter \ref{mult.chap}, it was observed that OLS estimators are BLUE (Best Linear Unbiased Estimators) when all of the assumptions of the classical regression model are satisfied. Assumption % \ref{E(Xu)}, however, states:\bigskip
\textbf{Assumption \ref{E(Xu)}: }The independent variables $% X_{1i},X_{2i},.\ldots ,X_{ki}$ are nonstochastic.\bigskip As discussed in Chapter \ref{mult.chap}, this assumption requires that the independent variables are independent of the error term. In the demand model appearing in equation \ref{demand.1.sc}, this condition is clearly violated.
Since price is determined by the interaction of supply and demand, random shocks that shift the demand curve (as captured by the error term $u_{t}$) will affect the magnitude of the price variable appearing on the right-hand side of the demand curve. A positive value of the error term in the demand curve, \textit{ceteris paribus}, results in a higher level of demand and a higher equilibrium price. Positive values of the error term in the demand equation will be associated with a higher price while negative values of this error term are associated with lower prices.\ Since $E(P_{t}u_{t})\neq 0 $, OLS estimates will be biased and inconsistent. As shown in the mathematical appendix at the end of this chapter, the expected value of the OLS estimator for $\beta _{1}$ is given by:
\begin{equation*}
E(\hat{\beta}_{1})=\beta _{1}+E\left[ \frac{\sum \left( P_{t}-\overline{P}% \right) u_{t}}{\sum \left( P_{t}-\overline{P}\right) ^{2}}\right] \end{equation*}
A careful reader will note that the problem of simultaneous equations bias is essentially equivalent to the problem of measurement error discussed in Chapter \ref{spec.chap}. In this chapter, it was shown that the presence of measurement error in an independent variable results in a correlation between the observed independent variable and the error term in the equation. This correlation between the observed dependent variable and the error term results in biased and inconsistent estimates. This is essentially the same problem that occurs in the case of an endogenous right-hand side variable.
OLS estimators are inconsistent when an endogenous variable appears on the right-hand side of the equation. In this particular case, as the size of the sample increases the estimated slope coefficient converges to:\footnote{% To be somewhat more precise, a more appropriate notation is: \begin{equation*}
\text{plim }\hat{\beta}_{1}=\beta _{1}+\frac{\left( 1-\beta _{1}\gamma _{1}\right) \left( \gamma _{1}\sigma _{u}^{2}\right) }{\gamma _{1}^{2}\sigma _{u}^{2}+\sigma _{v}^{2}}
\end{equation*}%
where \textquotedblleft plim\textquotedblright\ is defined in the following manner:
\begin{equation*}
\text{plim(}\hat{\theta}_{N})=\theta \text{ if and only if:} \end{equation*}%
\begin{equation*}
\underset{N\rightarrow \infty }{\lim }\left[ \text{Prob}\left( \left\vert \hat{\theta}_{N}-\theta \right\vert >\epsilon \right) \right] =0\text{ for any }\epsilon >0
\end{equation*}%
}
\begin{equation*}
\underset{N\rightarrow \infty }{\lim }\hat{\beta}_{1}=\beta _{1}+\frac{% \left( 1-\beta _{1}\gamma _{1}\right) \left( \gamma _{1}\sigma _{u}^{2}\right) }{\gamma _{1}^{2}\sigma _{u}^{2}+\sigma _{v}^{2}} \end{equation*}%
The last term on the right-hand side of this equation will generally be nonzero. Thus, the OLS estimator will remain biased even when the size of the sample approaches infinity.
When measurement error is present in a right-hand side variable, an instrumental variables (IV) estimator may be used to provide consistent estimates of all model parameters. This instrumental variables estimator may also be used to provide consistent estimates when an endogenous variable is present on the right-hand side of a regression equation. Let’s examine this estimator.
\section{Two-stage least squares (2SLS)}
When an instrumental variables estimation procedure is used to correct for the presence of one or more right-hand side endogenous variables, the estimation procedure is generally referred to as a \textbf{two-stage least squares (2SLS)} estimator. Let’s discuss the construction of such an estimator.
A general form for a simultaneous-equation model is given by: \begin{equation}
Y_{1i}=\alpha _{1o}+\alpha _{11}Y_{2i}+\cdots +\alpha _{1(m-1)}Y_{mi}+\beta _{11}X_{1i}+\cdots +\beta _{1k}X_{ki}+u_{1i} \label{sim.eq.1.sc} \end{equation}%
\begin{equation*}
Y_{2i}=\alpha _{2o}+\alpha _{21}Y_{1i}+\cdots +\alpha _{2(m-1)}Y_{mi}+\beta _{21}X_{1i}+\cdots +\beta _{2k}X_{ki}+u_{2i}
\end{equation*}%
\begin{equation*}
\vdots
\end{equation*}%
\begin{equation*}
Y_{mi}=\alpha _{mo}+\alpha _{m1}Y_{1i}+\alpha _{m2}Y_{2i}+\cdots +\alpha _{m(m-1)}Y_{(m-1)i}+\beta _{m1}X_{1i}+\cdots +\beta _{mk}X_{ki}+u_{mi} \end{equation*}%
In this system of equations, there are $m$ endogenous variables ($% Y_{1},Y_{2} $, $\ldots ,Y_{m}$) and $k$ \textbf{predetermined variables} ($% X_{1}$, $X_{2} $, $\ldots ,X_{k}$). These predetermined variables may include lagged values of the endogenous and exogenous variables as well as exogenous variables. Each of the $m$ endogenous variables in equation system % \ref{sim.eq.1.sc} is assumed to be affected by some or all of the other endogenous and predetermined variables. Large macroeconomic models used for forecasting purposes contain hundreds of such equations. Since these equations describe the structural relationships embodied in the economic model, they are referred to as \textbf{structural equations}. Examples of such structural equations include demand and supply equations, consumption functions, investment demand equations, cost equations, and other relationships that are examined in introductory economics classes. The parameters $\alpha _{ij}$ and $\beta _{ij}$ are called \textbf{structural parameters}.
Throughout most of this chapter, it is assumed that the error terms in each structural equation is independent of the error terms in the other equations. (This assumption will be relaxed in section \ref{oe.sc}). The variance of the error term in each equation is assumed to be constant for all observations (\textit{i.e.}, the errors are assumed to be homoskedastic). Thus, it is assumed that:
\begin{equation*}
E(u_{si}u_{ti})=\{%
\begin{array}{l}
0\text{ for }s\neq t \\
\sigma _{s}^{2}\text{ for }s=t%
\end{array}%
\end{equation*}%
It is also assumed that the error terms are uncorrelated across observations. Thus:
\begin{equation*}
E(u_{si}u_{s_{j}})=0\text{ for }i\neq j
\end{equation*}%
In the case of time-series models, this assumption requires that the error terms exhibit no autocorrelation.
For a simultaneous equations model to be estimated, certain restrictions must be placed upon the parameters of the equation system \ref{sim.eq.1.sc}.
These restrictions are discussed below in Section \ref{ident.sc}. For now, it is assumed that these conditions are satisfied.
To estimate the parameters of each equation in a simultaneous equations system, the following two-stage least squares estimation procedure may be used (subject to the restrictions discussed below):\footnote{% This procedure was developed independently by Theil and Basmann. A more complete discussion may be found in Theil (1953, 1978), Basmann (1957) or any advanced text.}
\begin{enumerate}
\item[Step 1:] Regress each of the right-hand side endogenous variables in each equation on \textbf{all} of the predetermined variables in the equation (using an OLS\ estimation procedure).
\item[Step 2:] Replace the right-hand side endogenous variables appearing on the right-hand side of the equation with the fitted values from the first-stage regression(s). Use an OLS estimation procedure to estimate the parameters of the transformed equation.
\end{enumerate}
Thus, the 2SLS\ procedure is simply an instrumental variables estimator in which all of the predetermined variables in the entire system of equations are used as instruments for the right-hand side endogenous variables. Note that at least one of the predetermined variables must be omitted from the equation being estimated. If all of the predetermined variables appear on the right-hand side of this equation, a perfect multicollinearity problem would occur in the second-stage estimation process (since the generated value of the endogenous variable would be an exact linear combination of the other right-hand side variables). The “identification” restrictions considered below help to determine whether the model parameters are estimable.
In Chapter \ref{spec.chap}, it was noted that instrumental variables must be independent of the residual term, but must be correlated with the variable(s) for which they are serving as instruments. In the case of a simultaneous equations model, the predetermined variables, by definition, are all independent of the error terms. Since the equilibrium values of the endogenous variables are determined, in large part, by the level of the predetermined variables serving as instruments, the correlation between the instruments and the right-hand side endogenous variables will be nonzero.
Thus, the set of all predetermined variables in the entire model satisfies the criteria for a desirable set of instruments.
As long as all of the other conditions of the classical regression model are satisfied, the 2SLS\ procedure provides consistent estimates of the intercept and slope coefficients (and the associated standard errors) for the structural equation.\footnote{%
The presence of either heteroskedasticity or autocorrelation will cause uncorrected 2SLS estimates to be inefficient and will result in incorrect estimates of the standard errors. The 2SLS\ estimated intercept and slope parameters, however, are still consistent when heteroskedasticity is present. As long as there are no lagged dependent variables among the predetermined variables, 2SLS estimators will also be consistent when autocorrelation is present. The presence of lagged dependent variables, however, will result in inconsistent parameter estimates when a lagged dependent variable is present.} The 2SLS estimator, however, is biased in finite samples. Since this estimator is consistent, however, the magnitude of the bias declines as the size of the sample rises.
\subsection{Caution: 2SLS standard errors}
While it is fairly easy to generate 2SLS\ using the manual procedure described above, it should be noted that the OLS\ standard errors in the second stage regression equation are incorrect. The basic problem is that the second-stage OLS\ estimation procedure does not take the use of fitted values (in place of the right-hand side endogenous variables) into account.
A simple correction procedure is described in the mathematical appendix at the end of this chapter. Fortunately, however, most modern econometric packages contain estimators for the 2SLS model that automatically provide corrected standard errors.
\section{Identification\label{ident.sc}}
\subsection{Demand and supply model}
In a simultaneous equations model, certain conditions must be satisfied for the parameters of a structural equation to be estimated. To illustrate this problem, let’s reconsider the simple demand and supply model given by: \begin{equation}
\text{Demand relationship: }Q_{t}=\beta _{o}+\beta _{1}P_{t}+u_{t} \label{demand.2.sc}
\end{equation}%
and
\begin{equation}
\text{Supply relationship: }P_{t}=\gamma _{o}+\gamma _{1}Q_{t}+v_{t} \label{supply.2.sc}
\end{equation}%
As noted above, it is not possible to estimate the parameters of either the demand curve or the supply curve in this model since observed prices and quantities are the result of shifts in both demand and supply. This problem was illustrated in the Figure~\ref{dands_g_sc} above. The basic problem in this case is that it is not possible to determine whether the change in outcome is the result of a shift in the demand curve (caused by a change in $% u_{t}$) or the result of a shift in the supply curve (caused by a change in $% v_{t}$). When it is not possible to estimate the parameters of a structural equation for this reason, econometricians say that the equation is not \textbf{identified}. If sufficient information is available to estimate all of the parameters of an equation, then that equation is said to be identified.
In practice, however, demand and supply equations are not generally expressed in the form appearing in equations \ref{demand.2.sc} and \ref% {supply.2.sc}. In virtually all applications, there are one or more additional exogenous variables that appear in each of these equations. The demand and supply equations appearing in section \ref{ds.sec.sc} can be used to illustrate the effect of the presence of including exogenous variables in this model.
\begin{equation} \label{demand.a.sc}
\text{Demand relationship: }Q_t=\beta _o+\beta _1P_t+\beta _2X_t+u_t \end{equation}
\begin{equation} \label{supply.a.sc}
\text{Supply relationship: }P_t=\gamma _o+\gamma _1Q_t+\gamma _2Z_t+v_t \end{equation}
In this model, the variable $X_t$ is an exogenous variable that affects demand, but has no effect on supply. The variable $Z_t$ is assumed to affect supply, but not demand. Will the presence of these variables make it possible to “identify” the demand and supply curves?
Consider the effect of a change in the level of the exogenous variable $% Z_{t} $. \textit{Ceteris paribus}, changes in this variable will shift the supply curve along the demand curve. As the supply curve shifts, the observed equilibrium combinations of price and quantity are all points on the demand curve for this commodity. Figure~\ref{sshift_g_lim} illustrates this possibility. Since these shifts in the supply curve are the result of changes in the observable exogenous variable, $Z_{t}$, it is possible to use changes in the level of $Z_{t}$ to determine the intercept and slope of the demand curve. This diagram suggests that the presence of the exogenous variable $Z_{t}$ in the supply equation (and its absence in the demand equation) makes it possible to estimate the intercept and slope parameters for the demand curve. In this case, econometricians state that the demand equation is identified.
\begin{center}
\FRAME{ftbpFU}{4.74in}{3.3901in}{0pt}{\Qcb{Shifts in supply}}{\Qlb{% sshift_g_lim}}{fig15-2.gif}{\special{language “Scientific Word”;type “GRAPHIC”;maintain-aspect-ratio TRUE;display “USEDEF”;valid_file “F”;width 4.74in;height 3.3901in;depth 0pt;original-width 4.6873in;original-height 3.3434in;cropleft “0”;croptop “1”;cropright “1”;cropbottom “0”;filename ‘GRAPHS/Fig15-2.gif’;file-properties “XNPEU”;}}
\end{center}
The variable $X_{t}$ in this model appears only in the demand equation.
Thus, a \textit{ceteris paribus }change in the level of $X_{t}$ will cause the demand curve to shift along the supply curve. These shifts in the demand curve make it possible to estimate the intercept and slope of the supply equation. This relationship is depicted in Figure~\ref{dshifts_g_sc}.
\begin{center}
\FRAME{ftbpFU}{4.8352in}{3.4212in}{0pt}{\Qcb{Shifts in demand curve}}{\Qlb{% dshifts_g_sc}}{fig15-3.gif}{\special{language “Scientific Word”;type “GRAPHIC”;maintain-aspect-ratio TRUE;display “USEDEF”;valid_file “F”;width 4.8352in;height 3.4212in;depth 0pt;original-width 4.7816in;original-height 3.3754in;cropleft “0”;croptop “1”;cropright “1”;cropbottom “0”;filename ‘GRAPHS/Fig15-3.gif’;file-properties “XNPEU”;}}
\end{center}
Thus, in a two-equation model, the identification of each equation requires that at least one exogenous variable is excluded from the equation. In the example above, the demand curve is identified because the exogenous variable $Z_i$ appears only in the supply equation. The supply curve is identified because of the presence of the variable $X_i$ in only the demand equation.
One way of viewing this is that the 2SLS estimator requires the presence of at least one additional instrumental variable for each endogenous variable.
This instrumental variable must not appear as an independent variable in the equation in which it is serving as an instrument. If, for example, the variable $Z_t$ appeared in both the demand and supply equations, it would be impossible to estimate the demand equation. (Since the predicted price variable created in the first stage of this process is a linear function of $% Z_t$ it would be perfectly collinear with $Z_t$ in the second stage regression.)
This problem can also be seen intuitively. If $Z_{t}$ appeared in both the demand and supply equations, then both the demand and supply curves would shift in response to a change in this variable. In this case, the problem depicted in Figure~\ref{dands_g_sc} would again occur. The parameters of the demand curve can only be estimated if there is some variable that shifts supply that does not affect the demand curve. A similar argument suggests that if $X_{t}$ appeared in both the demand and supply equations, it would not be possible to estimate the parameters of the supply equation.
Of course, in a more complete specification, both the demand and supply equations would contain several exogenous variables. Some of these variables (such as the prices of other goods) may appear in both the demand and supply equations. As long as at least one of these exogenous variables is excluded from each equation, however, the parameters of the demand and supply equations can be estimated.
\subsection{General conditions for identification\label{just.id.sc}} Let’s consider the more general case in which there are $m$ equations, $m$ endogenous variables, and $k$ predetermined variables. For an equation to be identified, the number of right-hand side endogenous variables must be less than or equal to the number of predetermined variables excluded from the equation.\footnote{%
Technically, this condition (known as the \textquotedblright order condition\textquotedblright\ for identification) is a necessary, but not sufficient condition for identification. Another condition, known as the \textquotedblleft rank condition\textquotedblright\ must also be satisfied to guarantee that the equation is identified. In most practical applications in which the order condition is satisfied, however, the rank condition will also be satisfied. A full discussion of the rank condition requires mathematical tools that are beyond the scope of the current text. Readers familiar with matrix algebra may find a good treatment in Theil (1971), pp.
443-450 or Greene (2000, pp. 663-76.} This condition is necessary to ensure that there are enough predetermined variables to serve as instruments for the right-hand side endogenous variables.
An equation in a simultaneous equations system is said to be:\footnote{% An alternative, and equivalent, method of determining whether an equation is identified is to compare the total number of predetermined variables in the entire system of equations with the number of slope coefficients to be estimated in the equation. For simplicity, let’s define $s_{j}$ as the number of unknown slope coefficients in equation $j$. Equation $j$ is: \par
\begin{itemize}
\item \textbf{underidentified} if the total number of predetermined variables in the system of equations ($k$) is less than the number of slope parameters to be estimated ($s_{j}$)
\par
\item \textbf{just identified} if $k=s_{j}$; or
\par
\item \textbf{overidentified} if $k>s_{j}$.
\end{itemize}
}
\begin{itemize}
\item \textbf{underidentified} if the number of predetermined variables excluded from the equation is less than the number of right-hand side endogenous variables appearing in the equation;
\item \textbf{just identified} if the number of predetermined variables excluded from the equation equals the number of right-hand side endogenous variables appearing in the equation; or
\item \textbf{overidentified} if the number of predetermined variables excluded from the equation is greater than the number of right-hand side endogenous variables appearing in the equation.
\end{itemize}
When an equation is underidentified, a 2SLS\ estimation procedure cannot be used to estimate the equation’s parameters. In an underidentified equation, a perfect multicollinearity relationship would exist among the generated and predetermined regressors in the second-stage regression equation.\footnote{% A proof of this proposition requires matrix algebra. Readers who are comfortable with matrix algebra may find a proof in Theil (1971), pp.
443-450.} Students who are learning econometrics will sometimes attempt to estimate underidentified equations by 2SLS and receive warning messages that state that a multicollinearity problem has occurred. (Some econometrics packages containing 2SLS estimators will provide an error message that states that the equation is not identified.)
Just identified and overidentified equations, however, can be estimated using a 2SLS estimation procedure. In a just-identified equation, the model specification provides the minimum number of instrumental variables needed to generate first-stage estimates of the right-hand side endogenous variables. In an overidentified equation, there are more instrumental variables than are necessary to estimate the parameters of the structural equations. Of course, a 2SLS estimator can be used as long as the equation is either just identified or overidentified. Thus, economists simply state that the equation is \textquotedblleft identified\textquotedblright\ whenever it is either just identified or overidentified. (Under the indirect least squares estimator discussed below, however, the distinction between just identified and overidentified equations is somewhat more important.)% %TCIMACRO{%
%\TeXButton{Draft Lottery example}{\exbox{The draft lottery and instrumental variables}{ %Angrist (1990) investigated the relationship between lifetime earnings and military %service. While time spent in the military is expected to affect lifetime earnings, it is %also quite likely that the decision to enter the military will be affected by alternative %earnings prospects. Thus, an OLS estimate of an earnings equation in which %military service is an independent variable is subject to a simultaneous equations bias.
%
%To correct for this problem, Angrist used an instrumental variables estimator to %correct for the endogeneity of military service. As noted in the text, this requires that %one or more instruments be found that are correlated with the military service variable %but are uncorrelated with the error term in the earnings equation. Since Angrist %was analyzing veterans from the Viet Nam era, the draft lottery in use during this %period provides a convenient source for an instrument. Under this draft system, %a lottery was conducted each year that resulted in a random ordering of birth dates.
%Individuals with high lottery numbers (birth dates that were high in this ordering) %were the first to be drafted. (Many of these individuals chose to volunteer to %improve the terms of the contract.) Those with low lottery numbers were less likely %to be drafted.
%
%Thus, under the draft system, draft eligibibility status provides a convenient %instrumental variable. In this situation, an individual’s %date of birth affects the decision to join the military, but is unlikely to be correlated %with the error term in an earnings equation.
%
%It is generally more difficult, however, to find variables that are likely to %be truly exogenous.
%}}}%
%BeginExpansion
\exbox{The draft lottery and instrumental variables}{
Angrist (1990) investigated the relationship between lifetime earnings and military service. While time spent in the military is expected to affect lifetime earnings, it is also quite likely that the decision to enter the military will be affected by alternative earnings prospects. Thus, an OLS estimate of an earnings equation in which military service is an independent variable is subject to a simultaneous equations bias.
To correct for this problem, Angrist used an instrumental variables estimator to correct for the endogeneity of military service. As noted in the text, this requires that one or more instruments be found that are correlated with the military service variable but are uncorrelated with the error term in the earnings equation. Since Angrist was analyzing veterans from the Viet Nam era, the draft lottery in use during this period provides a convenient source for an instrument. Under this draft system, a lottery was conducted each year that resulted in a random ordering of birth dates.
Individuals with high lottery numbers (birth dates that were high in this ordering) were the first to be drafted. (Many of these individuals chose to volunteer to improve the terms of the contract.) Those with low lottery numbers were less likely to be drafted.
Thus, under the draft system, draft eligibibility status provides a convenient instrumental variable. In this situation, an individual’s date of birth affects the decision to join the military, but is unlikely to be correlated with the error term in an earnings equation.
It is generally more difficult, however, to find variables that are likely to be truly exogenous.
}%
%EndExpansion
\section{Example:\ Demand and supply}
Consider the demand and supply equations discussed above:% \begin{equation}
\text{Demand relationship: }Q_{t}=\beta _{o}+\beta _{1}P_{t}+\beta _{2}X_{t}+u_{t} \label{demand2.zz}
\end{equation}%
\begin{equation}
\text{Supply relationship: }P_{t}=\gamma _{o}+\gamma _{1}Q_{t}+\gamma _{2}Z_{t}+v_{t} \label{supply2.zz}
\end{equation}%
In this model there are two exogenous variables ($X_{t}$ and $Z_{t}$). Using the criteria above, we can see that both the demand and supply equations are just identified. The demand equation is identified because there are two predetermined variables in the system of equations, but only $X_{t}$ appears on the right-hand side of the demand equation. Note that the number of slope parameters to be estimated \ in each equation is just equal to the total number of exogenous variables in the system of equations.
How does one determined which variables belong in each equation? When possible, econometricians attempt to rely on economic theory to guide such decisions. In the case of demand, some variables that may affect demand without affecting supply include:
\begin{itemize}
\item prices of substitutes and complements,
\item consumer income,
\item advertising spending, and
\item demographic factors (such as the age structure of the population).
\end{itemize}
Similarly, there are some variables that can reasonably be expected to affect supply without affecting demand:
\begin{itemize}
\item resource prices,
\item measures of resource productivity, and
\item changes in government regulation that affect production costs.
\end{itemize}
\subsection{Demand and supply of loans\label{D_S_loans}}
Let’s examine a simple demand and supply model of commercial loans issued by banks. The model is given by:%
\begin{equation}
\text{Demand: Loans}_{t}=\beta _{o}+\beta _{1}\text{Prime}_{t}+\beta _{2}% \text{AAA}_{t}+\beta _{3}\text{Indprod}_{t}+u_{t} \label{demand3.zz} \end{equation}%
\begin{equation}
\text{Supply: Prime}_{t}=\gamma _{o}+\gamma _{1}\text{Loans}_{t}+\gamma _{2}% \text{Tbill3}_{t}+\gamma _{3}\text{Deposits}_{t}+v_{t} \label{supply3.zz} \end{equation}%
where:%
\begin{equation*}
\begin{array}{llll}
\text{where:} & \text{Loans}_{t} & = & \text{commercial and industrial loans (billions of dollars, } \\
& & & \text{not seasonally adjusted)} \\
& \text{Prime}_{t}\text{ } & = & \text{average prime rate} \\ & \text{AAA}_{t}\text{ } & = & \text{average yield to maturity on long-term bonds rated Aaa } \\
& & & \text{by Moody’s Investor Service} \\
& \text{Indprod}_{t}\text{ \ } & = & \text{index of industrial production\ } \\
& \text{Tbill3}_{t} & = & \text{yield on 3-month Treasury bills sold in secondary market} \\
& \text{Deposits}_{t}\text{ } & = & \text{total deposits in the banking system (billions)}%
\end{array}%
\end{equation*}%
In this model, it is assumed that the quantity of commercial loans demanded is affected by:
\begin{itemize}
\item the interest rate as measured by the prime rate (most commercial loans are either set at the prime rate or at a rate that varies from the prime rate by a fixed amount),
\item the cost of alternative financing (represented by AAA$_{t}$), and \item the level of demand for output (as measured by the index of industrial production).
\end{itemize}
The inverse supply relationship above suggests that the price of loans (the prime rate) is a function of:
\begin{itemize}
\item the quantity of loans supplied by banks,
\item the return on alternative bank assets such as Treasury bills, and \item the volume of total deposits in the banking system.
\end{itemize}
In each equation there is one endogenous variable on the right-hand side.
Since there are more excluded exogenous variables than there are endogenous variables, each equation is overidentified. Therefore, a 2SLS estimation procedure may be applied. In the first stage of this process, each of the endogenous variables (Loans$_{t}$ and Prime$_{t}$) is regressed against all four of the exogenous variables (AAA$_{t}$ Indprod$_{t}$, Tbill3$_{t}$ and Deposits$_{t}$). The estimated equations are given by:\footnote{% The \textquotedblleft loans.dat\textquotedblright\ file contains the data used to estimate this model. A description of this data set appears in Table~% \ref{loans.dat} on p.~\pageref{loans.dat}.}%
\begin{equation}
\widehat{\text{Loans}_{t}}=\underset{(-11.59)}{-455.72}+\underset{(5.98)}{% 18.18}\text{AAA}_{t}+\underset{(11.58)}{7.59}\text{Indprod}_{t}-\underset{% (-1.54)}{3.69}\text{Tbill3}_{t} \label{loan_iv_est}
\end{equation}%
\begin{equation*}
+\underset{(11.28)}{0.13}\text{Deposits}_{t}
\end{equation*}%
\begin{equation}
\widehat{\text{Prime}_{t}}=\underset{(-2.55)}{-1.45}+\underset{(4.03)}{0.177}% \text{AAA}_{t}+\underset{(2.27)}{0.22}\text{Indprod}_{t}+\underset{(31.21)}{% 1.08}\text{Tbill3}_{t} \label{prime_iv_est}
\end{equation}%
\begin{equation*}
+\underset{(1.52)}{0.0026}\text{Deposits}_{t}
\end{equation*}%
\begin{equation*}
\text{(}t\text{-statistics in parentheses)}
\end{equation*}%
In both cases, the first-stage regressions have a strong fit, with $% \overline{\text{R}}^{2}$ values of 0.96 and 0.94, respectively, for equations \ref{loan_iv_est} and \ref{prime_iv_est}. In the second stage of the 2SLS process, the fitted values from equations \ref{loan_iv_est} and \ref% {prime_iv_est} are used as regressors in place of the right-hand side endogenous variables. The resulting 2SLS estimates are:
\begin{equation}
\text{Demand: Loans}_{t}=\underset{(-27.17)}{-837.12}+\underset{(-9.42)}{% -19.55}\widehat{\text{Prime}_{t}}+\underset{(12.31)}{41.06}\text{AAA}_{t} \label{demand_2sls_est}
\end{equation}%
\begin{equation*}
+\underset{(72.37)}{15.42}\text{Indprod}_{t}
\end{equation*}%
\begin{equation}
\text{Supply: Prime}_{t}=\underset{(2.12)}{0.46}+\underset{(1.82)}{0.0023}% \widehat{\text{Loans}_{t}}+\underset{(51.34)}{1.17}\text{Tbill3}_{t} \label{supply_2sls_est}
\end{equation}%
\begin{equation*}
+\underset{(0.00)}{0.0000017}\text{Deposits}_{t}
\end{equation*}%
Note that these estimates indicate a downward-sloping demand curve and an upward-sloping supply curve.
\section{Example: The consumption function}
Let’s examine another application of the 2SLS estimation procedure. Consider the simple Keynesian model discussed above. This model consists of the equations:
\begin{equation}
C_{t}=\beta _{o}+\beta _{1}YD_{t}+u_{t} \label{keynes.0.lc} \end{equation}%
\begin{equation}
Y_{t}=C_{t}+I_{t}+G_{t}+NX_{t} \label{keynes.0.lca}
\end{equation}%
\begin{equation}
YD_{t}=Y_{t}-Tn_{t} \label{keynes.0.lcb}
\end{equation}%
In this model there are three endogenous variables ($C_{t},Y_{t}$, and $% YD_{t}$) and four exogenous variables ($I_{t},G_{t},Tn_{t},$ and $NX_{t}$).% \footnote{%
Note that taxes and net exports are exogenous only if they do not vary with the level of $Y_{t}$. Since imports, taxes, and transfer payments all vary with income, this assumption is not appropriate in most practical cases.Lump-sum taxes are exogenous, but these are rarely observed.} As noted above, OLS estimates of the consumption function appearing in equation \ref% {keynes.0.lc} will result in biased and inconsistent estimates. A 2SLS estimator, however, will provide consistent estimates of these parameters.
(Note that the consumption function appearing in equation \ref{keynes.0.lc} is overidentified since there is only one right-hand side endogenous variable ($YD_{t}$) and there are four excluded predetermined variables ($% I_{t},$ $G_{t},$ $NX_{t}$ and $Tn_{t}$).
For comparison purposes, the parameters of equation \ref{keynes.0.lc} were first estimated using OLS and 2SLS methods:\footnote{%
The data used for this estimation is contained in the file \textquotedblleft gdp.dat.\textquotedblright\ This data is described in Table \ref{gdp.dat} in Appendix \ref{data.appendix}.}
\begin{equation}
\text{OLS estimates: }\hat{C}_{t}=\underset{(-2.78)}{-49.75}+\underset{% (169.52)}{0.9174}YD_{t} \label{c.ols.sc}
\end{equation}%
\begin{equation*}
\text{(}t\text{-statistics in parentheses)}
\end{equation*}%
Due to the endogeneity of the disposable income variable, however, an OLS estimation procedure results in biased and inconsistent parameter estimates.
To correct for this, a 2SLS estimation procedure may be applied. In the first stage of this process, the endogenous variable $YD_{t}$ is regressed against all of the exogenous variables ($I_{t},$ $G_{t},$ $NX_{t}$, and $% Tn_{t}$) in the equation system above. The first-stage regression is given by:%
\begin{equation*}
\text{First-stage regression: }\widehat{YD}_{t}=\underset{(2.16)}{150.915+}% \underset{(7.64)}{3.64}I_{t}+\underset{(4.53)}{1.064}G_{t}-\underset{(-0.13)}% {0.074Tn_{t}+}\underset{(4.10)}{2.417}NX_{t}
\end{equation*}%
In the second-stage of the 2SLS process, the dependent variable is original equation is estimated after replacing the endogenous variable(s) on the right-hand side of the equation with the predicted value from the first-stage regression. In this relatively simple case, $C_{t}$ is regressed against a constant term and the fitted value of $\widehat{YD}_{t}$ from the first-stage regression. The estimated equation is given by: \begin{equation}
\text{Second-stage regression: }\hat{C}_{t}=\underset{(-2.98)}{-53.62}+% \underset{(168.22)}{0.9188}\widehat{YD}_{t} \label{c.2sls.sc} \end{equation}%
\begin{equation*}
\text{(}t\text{-statistics in parentheses)}
\end{equation*}%
In this case, the OLS and 2SLS estimates do not differ substantially.
\section{Reduced-form estimation}
If an economist is only concerned with the effect of changes in the exogenous variables on the level of the endogenous variables, a \textbf{% reduced-form} version of the model may be estimated instead of the structural version. The reduced-form version of the simultaneous equation system is formed by transforming the equation system so that each endogenous variable is expressed solely as a function of the exogenous variables in the model. The right-hand side of each of these transformed equations does not contain any endogenous variables.
The simple Keynesian model discussed above can be used to illustrate the concept of a reduced-form version of a simultaneous equations model. The structural model in this case is given by equations \ref{keynes.0.lc}, \ref% {keynes.0.lca}, and \ref{keynes.0.lcb}.
Using a bit of algebraic manipulation, the equilibrium level of national income can be easily determined:
\begin{equation*}
Y_{t}=\left( \beta _{o}+\beta _{1}\left( Y_{t}-Tn_{t}\right) +u_{t}\right) +I_{t}+G_{t}+NX_{t}
\end{equation*}%
\begin{equation}
Y_{t}=\frac{\beta _{o}-\beta _{1}Tn_{t}+I_{t}+G_{t}+NX_{t}+u_{t}}{1-\beta _{1}} \label{int.step.sc}
\end{equation}%
\begin{equation}
Y_{t}=\frac{\beta _{o}}{1-\beta _{1}}+\frac{-\beta _{1}}{1-\beta _{1}}Tn_{t}+% \frac{1}{1-\beta _{1}}I_{t}+\frac{1}{1-\beta _{1}}G_{t} \label{rf.keynes.lc} \end{equation}%
\begin{equation*}
+\frac{1}{1-\beta _{1}}NX_{t}+\frac{u_{t}}{1-\beta _{1}}
\end{equation*}%
Using the following definitions:
\begin{equation*}
\pi _{o}=\frac{\beta _{o}}{1-\beta _{1}}
\end{equation*}%
\begin{equation*}
\pi _{1}=\frac{-\beta _{1}}{1-\beta _{1}}
\end{equation*}%
\begin{equation*}
\pi _{2}=\frac{1}{1-\beta _{1}}
\end{equation*}%
\begin{equation*}
\pi _{3}=\frac{1}{1-\beta _{1}}
\end{equation*}%
\begin{equation*}
\pi _{4}=\frac{1}{1-\beta _{1}}
\end{equation*}%
\begin{equation*}
v_{t}=\frac{u_{t}}{1-\beta _{1}}
\end{equation*}%
equation \ref{rf.keynes.lc} can be restated as:
\begin{equation}
Y_{t}=\pi _{o}+\pi _{1}Tn_{t}+\pi _{2}I_{t}+\pi _{3}G_{t}+\pi _{4}NX_{t}+v_{t} \label{rf.keynes.lca}
\end{equation}%
In equation \ref{keynes.lca}, the level of national income (one of the three endogenous variables in this model) has been written as a linear function of the exogenous variables in the model. This is the reduced-form equation for the level of national income. The intercept and slope coefficients in this equation ($\pi $$_{o},\pi _{1},\ldots ,\pi _{4})$ are referred to as \textbf{% reduced-form parameters}.
The reduced-form version of the consumption function can be derived in a similar manner. Substituting equations \ref{keynes.0.lcb} and \ref% {int.step.sc} into the consumption function (equation \ref{keynes.0.lc}) results in:
\begin{equation*}
C_{t}=\beta _{o}+\beta _{1}\left( \frac{\beta _{o}-\beta _{1}Tn_{t}+I_{t}+G_{t}+NX_{t}+u_{t}}{1-\beta _{1}}-Tn_{t}\right) +u_{t} \end{equation*}%
\begin{equation*}
=\frac{\beta _{o}-\beta _{o}\beta _{1}+\beta _{1}\beta _{o}-\beta _{1}^{2}Tn_{t}+\beta _{1}I_{t}+\beta _{1}G_{t}+\beta _{1}NX_{t}+\beta _{1}u_{t}-\beta _{1}Tn_{t}+\beta _{1}^{2}Tn_{t}+u_{t}-\beta _{1}u_{t}}{% 1-\beta _{1}}
\end{equation*}%
\begin{equation*}
=\frac{\beta _{o}+\beta _{1}I_{t}+\beta _{1}G_{t}+\beta _{1}NX_{t}-\beta _{1}Tn_{t}+u_{t}}{1-\beta _{1}}
\end{equation*}%
Thus,
\begin{equation}
C_{t}=\frac{\beta _{o}}{1-\beta _{1}}+\frac{\beta _{1}}{1-\beta _{1}}I_{t}+% \frac{\beta _{1}}{1-\beta _{1}}G_{t}+\frac{-\beta _{1}}{1-\beta _{1}}Tn_{t} \label{rf.c.sc}
\end{equation}%
\begin{equation*}
+\frac{\beta _{1}}{1-\beta _{1}}NX_{t}+\frac{u_{t}}{1-\beta _{1}} \end{equation*}%
Defining:
\begin{equation*}
\pi _{5}=\frac{\beta _{o}}{1-\beta _{1}}
\end{equation*}%
\begin{equation*}
\pi _{6}=\frac{\beta _{1}}{1-\beta _{1}}
\end{equation*}%
\begin{equation*}
\pi _{7}=\frac{\beta _{1}}{1-\beta _{1}}
\end{equation*}%
\begin{equation*}
\pi _{8}=\frac{-\beta _{1}}{1-\beta _{1}}
\end{equation*}%
\begin{equation*}
\pi _{9}=\frac{\beta _{1}}{1-\beta _{1}}
\end{equation*}%
\begin{equation*}
\epsilon _{t}=\frac{u_{t}}{1-\beta _{1}}
\end{equation*}%
the reduced-form version of the consumption function may be expressed as: \begin{equation}
C_{t}=\pi _{5}+\pi _{6}I_{t}+\pi _{7}G_{t}+\pi _{8}Tn_{t}+\pi _{9}NX_{t}+\epsilon _{t} \label{rf.cons.a.sc}
\end{equation}%
A similar derivation can be used to derive the reduced-form version of the disposable income $(YD_{t}$) equation. (This derivation is left to the reader as an exercise.)
An examination of equations \ref{rf.keynes.lca} and \ref{rf.cons.a.sc} suggests that the right-hand side variables in a reduced-form equations include all of the exogenous variables in the equation system. This result holds in the general case in which there are $m$ endogenous variables. (A proof of this proposition requires mathematical tools beyond the scope of this text.) A careful reader will note that each of the first-stage regressions in a 2SLS\ procedure involves the estimation of a reduced-form equation.
In addition to their role in 2SLS\ estimation procedures, however, reduced-form equations also play an important role in \textbf{comparative static analysis}. Comparative static analysis is used when economists wish to determine how a change in an exogenous variable affects the equilibrium value of one or more endogenous variables in an economic model.\footnote{% The slope coefficients actually capture the effect of a one-unit change in any \emph{predetermined} variable on the current equilibriumn value of the dependent variable. As noted above, these predetermined variables may consist of either exogenous variables or lagged dependent variables. The presence of lagged dependent variables, however, makes the discussion somewhat more complex.
\par
When lagged dependent variables appear as predetermined variables in a model, a change in any predetermined variable affects not only current, but also future levels of the dependent variable. The analysis of models that include lagged dependent variables is referred to as \textquotedblleft dynamic analysis.\textquotedblright\ Dynamic analysis describes how the dependent variable(s) in a model respond over time to some form of shock. A full discussion of this topic requires mathematical tools that are beyond the scope of this text. A discussion of single-equation models in which the current level of the dependent variable is affected by past values of the dependent variable models appears in Chapter \ref{ARIMA.chap}.
\par
Interested readers may find a more complete, but more advanced, discussion of dynamic models in Baumol (1970), Chow (1975). or Sargent (1987).} In equations \ref{rf.keynes.lca} and \ref{rf.cons.a.sc}, the reduced-form slope parameters ($\pi _{i}$) provide a measure of the effect of a \textit{ceteris paribus} effect of a one-unit change in the exogenous variables $% (I_{t},G_{t},Tn_{t},$ and $NX_{t}$). Students that have studied introductory macroeconomics should note that the slope coefficients in equation \ref% {rf.keynes.lca} are the simple lump-sum tax, investment, government spending, and net-export multipliers. In mathematical terms: \begin{equation}
\pi _{1}=\frac{\Delta Y}{\Delta Tn}=\frac{-\beta _{1}}{1-\beta _{1}} \label{pi1.sc}
\end{equation}%
\begin{equation}
\pi _{2}=\frac{\Delta Y}{\Delta I}=\frac{1}{1-\beta _{1}} \label{pi2.sc} \end{equation}%
\begin{equation}
\pi _{3}=\frac{\Delta Y}{\Delta G}=\frac{1}{1-\beta _{1}} \label{pi3.sc} \end{equation}%
\begin{equation}
\pi _{4}=\frac{\Delta Y}{\Delta NX_{t}}=\frac{1}{1-\beta _{1}} \label{pi4.sc}
\end{equation}
Each of these multipliers provide a measure of the change in the equilibrium level of national income that results from a one-unit change in the corresponding exogenous variables. The slope coefficients in equation \ref% {rf.cons.a.sc} have a similar interpretation. Each of the slope coefficients in this equation provides a measure of the change in the equilibrium level of consumption spending resulting from a one-unit change in the corresponding exogenous variable.
The estimated versions of the reduced-form equations for this model may be used to forecast the effect of changes in government policy ($G$ and $Tn$) on the equilibrium values of the endogenous variables ($Y,$ $C$ and $YD$).
\section{Indirect least squares}
The 2SLS\ estimation procedure discussed above is, by far, the most widely used method for estimating the parameters of simultaneous equation systems.
It is also possible, however, to use the estimates from the reduced-form equations to generate estimates of the structural parameters when an equation is either just identified or overidentified. This method is referred to as an \textbf{indirect least squares} estimation technique.
Suppose, for example, that the parameters of the reduced-form national income model appearing in equation \ref{rf.keynes.lca} are estimated. Using equation \ref{pi2.sc}, an estimated value of the MPC (= $\beta _{1}$) can be constructed from the estimated reduced-form parameter $\hat{\pi}_{2}$.
Since,
\begin{equation*}
\pi _{2}=\frac{1}{1-\beta _{1}}
\end{equation*}%
the value of $\beta _{1}$ must equal:
\begin{equation*}
\beta _{1}=1-\frac{1}{\pi _{2}}
\end{equation*}%
Thus, an indirect least squares estimate of $\beta _{1}$ can be computed as:% \footnote{%
Indirect least square estimators are biased but consistent. The bias exists because of the nonlinear nature of the equations relating the reduced-form and structural parameter estimates. As noted in Chapter \ref{stat.chap}, $% E(g(X))$ is not generally equal to $g(E(X))$.}
\begin{equation*}
\hat{\beta}_{1}=1-\frac{1}{\hat{\pi}_{2}}
\end{equation*}%
If an equation is just identified, a unique solution for each structural parameter may be derived from the reduced-form estimates. When an equation is overidentified, however, multiple solutions for the structural parameters are possible. An examination of equations \ref{pi2.sc} to \ref{pi4.sc} suggests how this could occur. Each of the estimated reduced-form parameter estimates $\pi _{2}$, $\pi _{3}$, and $\pi _{4}$ could be used to generate alternative estimates of the parameter $\beta _{1}.$
While overidentified models provide multiple estimates of a structural parameter, there is only one correct value for this parameter in the population. One advantage of the 2SLS\ estimation procedure over the indirect least squares procedure is that a 2SLS estimator will generate only one estimated value for each parameter in either a just identified or overidentified equation.
Neither a 2SLS nor an indirect least squares regression procedure, however, will allow an econometrician to estimate the structural parameters in an underidentified equation. A demonstration of this result (and a more extensive discussion of the indirect least squares estimation procedure) appears in the mathematical appendix at the end of this chapter.
\section{Model specification}
In many economic applications, it is difficult to determine whether a given variable is exogenous or endogenous. This problem often arises in macroeconomic models. Consider the simple macroeconomic model discussed above. In this model, it was assumed that investment spending, government spending, net taxes, and net exports were exogenous. Are these assumptions reasonable? Is it not likely that investment spending will increase when GDP rises? Will the levels of taxes and transfer payments be unaffected by the level of national income? Is it not likely that the level of imports will rise when income increases? Thus, it seems that at least three of the four exogenous variables used in this model should instead be considered endogenous. In fact, it might also be possible that the government will alter spending decisions in response to changes in the level of national income. For an economic model to be identified, a sufficient number of exogenous variables must appear in the equation system. As this example suggests, it is often quite difficult to find variables that are truly exogenous.
As an additional example, consider the role that the money supply plays in a more complete model of national income determination. Most economists would agree that the money supply is an important factor that affects the equilibrium level of nominal GDP. But is it appropriate to consider the money supply to be exogenous? One of the most important determinants of the size of the money supply is the volume of loans made by the banking system.
Since banks are willing to make more loans when national income is rising, an increase in national income will result in an increase in the size of the money supply. Furthermore, it is quite likely that the Fed will attempt to adjust the size of the money supply in response to economic conditions.
Thus, economic theory suggests that the money supply is also an endogenous variable in a more complete macroeconomic model.
Since the identification of simultaneous equations relies, in part, on the availability of exogenous variables, it would be helpful to have a formal test for exogeneity. A test for this purpose has been proposed by Sims (1972). Since this exogeneity test relies on the Granger-Sims causality test, it will be helpful if the concept of causality test is discussed first.
\subsection{Granger-Sims causality test}
Granger (1969) argues that the concept of causality can be defined in a statistical manner. In particular, Granger states that $X_t$ can be called a cause of $Y_t$ if information concerning the level of $X_t$ improves the ability to forecast $Y_t$ after taking into account all other relevant information. Note that this definition of causality is a purely statistical definition based upon the correlation between the two time series after controlling for the effect of other variables. As you might expect, defining causation in terms of observed correlations is somewhat controversial.% \footnote{%
See, for example, the discussion in Jacobi, Leamer, and Ward (1979), or Feige and Pearce (1979).} Some of the problems associated with this test are discussed below.
Suppose that an econometrician wishes to test to determine whether $X_{t}$ is a cause of $Y_{t}$. In this case, past values of $X_{t}$ will have an effect on the current value of $Y_{t}$, after controlling for the effect of all other variables that affect $Y_{t}$. A simple \textbf{causality test}, proposed by Granger, involves the estimation of the following equation:% \footnote{%
An alternative version of this test was proposed by Sims (1972).} %TCIMACRO{%
%\TeXButton{Chicken-eggs}{\exbox{Chickens or eggs?}{
%For generations, the question of “Which came first, the chicken or the egg?” has %perplexed philosophers, biologists, and agricultural economists. Thurman and Fisher (1988) %used a Granger causality test in an attempt to provide an empirical resolution to this %issue. Annual time series data on egg production and (non-broiler) chicken population for the %years 1930-1983 was used to conduct their analysis.
%
%To determine whether chickens cause eggs, current egg production was regressed %on lagged egg
%production and lagged chicken population (4 lags were used for each variable). If %past values of chicken population are jointly significant in explaining the level of %egg production (controlling for the effect of past egg production), then chickens can %be said to cause eggs. A similar regression of
%current chicken population on lagged chicken population and lagged egg production %is used to test for causality from eggs to chickens. If past egg production levels are %significant in explaining the level of chicken population, then eggs cause chickens.
%
%This study concludes that eggs cause chickens, but chickens do not cause eggs.
%Thus, it appears that the egg must come first.
%
%On a more serious note, this analysis was designed, in large part, to cast some doubt %on the use of causality tests in establishing the presence or absence of causal %relationships.
%}} }%
%BeginExpansion
\exbox{Chickens or eggs?}{
For generations, the question of “Which came first, the chicken or the egg?” has perplexed philosophers, biologists, and agricultural economists. Thurman and Fisher (1988) used a Granger causality test in an attempt to provide an empirical resolution to this issue. Annual time series data on egg production and (non-broiler) chicken population for the years 1930-1983 was used to conduct their analysis.
To determine whether chickens cause eggs, current egg production was regressed on lagged egg
production and lagged chicken population (4 lags were used for each variable). If past values of chicken population are jointly significant in explaining the level of egg production (controlling for the effect of past egg production), then chickens can be said to cause eggs. A similar regression of
current chicken population on lagged chicken population and lagged egg production is used to test for causality from eggs to chickens. If past egg production levels are significant in explaining the level of chicken population, then eggs cause chickens.
This study concludes that eggs cause chickens, but chickens do not cause eggs.
Thus, it appears that the egg must come first.
On a more serious note, this analysis was designed, in large part, to cast some doubt on the use of causality tests in establishing the presence or absence of causal relationships.
}
%EndExpansion
\begin{equation}
Y_{t}=\alpha _{o}+\alpha _{1}Y_{t-1}+\alpha _{2}Y_{t-2}+\cdots +\alpha _{m}Y_{t-m} \label{caus.tst.sc}
\end{equation}%
\begin{equation*}
+\beta _{1}X_{t-1}+\beta _{2}X_{t-2}+\cdots +\beta _{m}X_{t-m}+u_{t} \end{equation*}%
In equation \ref{caus.tst.sc}, it is assumed that the current value of $% Y_{t} $ is solely a function of past values of $Y_{t}$ and $X_{t}$. To test to determine whether $X_{t}$ is a cause of $Y_{t}$, the following procedure is used:
\begin{enumerate}
\item[Step 1:] Estimate the parameters of equation \ref{caus.tst.sc} using OLS.
\item[Step 2:] A test of causality is performed by testing the hypothesis: \begin{equation*}
\text{H}_{o}\text{: }\beta _{1}=\beta _{2}=\cdots =\beta _{m}=0 \end{equation*}%
using a Wald test. This requires the estimation of a restricted version of this model:
\begin{equation*}
Y_{t}=\alpha _{o}+\alpha _{1}Y_{t-1}+\alpha _{2}Y_{t-2}+\cdots +\alpha _{m}Y_{t-m}+v_{t}
\end{equation*}%
If the null hypothesis can be rejected at the preselected significance level, then it is claimed that $X_{t}$ is a cause of $Y_{t}$.
\end{enumerate}
There are several problems with this test. Among these are: \begin{itemize}
\item The results are often very sensitive to the length of the lag ($m$) used in the original equation.\footnote{%
To further complicate this issue, different lag lengths may be used for the $% X_{t}$ and $Y_{t}$ variables. The most common practice, however, is to use the same lag length for each variable.}
\item A positive finding of causality may be found if both $X_{t}$ and $% Y_{t} $ are caused by one or more other variables. Suppose, for example, that changes in $Z_{t}$ affect $X_{t}$ with a one-period lag, and affect $% Y_{t}$ with a two-period lag. In this case, it would appear that $X_{t}$ is a cause of $Y_{t}$ (with a one-period lag).
\item Expectations of future values of $Y_t$ may have an effect on current $% X_t$. In this case, it would appear that $X_t$ is a cause of $Y_t$ when in fact (the expected value of ) future $Y_t$ is a cause of current $X_t$.
\end{itemize}
To deal with the first problem, it is probably desirable to estimate several versions of this model with alternative lag lengths. If the results are robust over these alternative specifications, then somewhat more faith may be placed in the test. Quite often, however, econometricians have found that the results of causality tests of this sort are very sensitive to the length of the lag structure.
The second of these problems can be dealt with by expanding the model to include lagged values of other variables in both the unrestricted and restricted forms of the regression model. The limited number of observations available in most economic time-series models, however, often places limits on the number of additional variables that can be included in the model.
The third problem, however, is somewhat more difficult. Two forms of expectations considered by economists are \textbf{adaptive} and \textbf{% rational expectations}. Adaptive expectations are based upon the past values of the variable itself. Rational expectations are unbiased forecasts based upon all available information. If individuals possess adaptive expectations, the causality test described above would be adequate (subject to the above problems). In this case, the lagged values of $Y_t$ would capture the effect of the expectations generation process. The presence of rational expectations, however, raises some doubt about the interpretation of causality tests. Under rational expectations, expectations of the future value of a variable are formed using all available information (including information on variables other than $X_t$ and $Y_t)$. Thus, it is possible that an observed correlation between lagged $X$ and current $Y$ may actually reflect a causal relationship between current $X$ and the expected future value of $Y$.
%TCIMACRO{%
%\TeXButton{Lucas critique}{\exbox{Rational expectations: the Lucas critique}{ %Robert E. Lucas was one of the primary developers and proponents of the %“rational expectations hypothesis.” This hypothesis states that individuals form %unbiased expectations on the basis of all relevant information. Earlier econometric %models of expectations formation were generally based on the %somewhat simpler model of adaptive expectations. In a classic study, Lucas and %Rapping (1969) used a rational expectations hypothesis to explain the apparent %instability of the Phillips
%curve relationship. They argued that the apparent instability of the Phillips curve %was actually a rational economic response to changing expectations. Under this %argument, government
%policy designed to affect the mix of inflation and unemployment will alter the process by %which expectations are formed and will change the position of the observed Phillips %curve relationship.
%
%In 1995, Lucas won the Nobel Prize in economics for his pathbreaking work on %the role of expectations. Ironically, however, one-half of the monetary value of the %award was received by his ex-wife, as a result of a divorce settlement that contained %a clause providing her with one-half of any Nobel Prize award received within 10 years %of the divorce. The existence of such a stipulation in a divorce settlement provides %further evidence supporting the rational expectations hypothesis.
%}}}%
%BeginExpansion
\exbox{Rational expectations: the Lucas critique}{
Robert E. Lucas was one of the primary developers and proponents of the “rational expectations hypothesis.” This hypothesis states that individuals form unbiased expectations on the basis of all relevant information. Earlier econometric models of expectations formation were generally based on the somewhat simpler model of adaptive expectations. In a classic study, Lucas and Rapping (1969) used a rational expectations hypothesis to explain the apparent instability of the Phillips
curve relationship. They argued that the apparent instability of the Phillips curve was actually a rational economic response to changing expectations. Under this argument, government
policy designed to affect the mix of inflation and unemployment will alter the process by which expectations are formed and will change the position of the observed Phillips curve relationship.
In 1995, Lucas won the Nobel Prize in economics for his pathbreaking work on the role of expectations. Ironically, however, one-half of the monetary value of the award was received by his ex-wife, as a result of a divorce settlement that contained a clause providing her with one-half of any Nobel Prize award received within 10 years of the divorce. The existence of such a stipulation in a divorce settlement provides further evidence supporting the rational expectations hypothesis.
}%
%EndExpansion
\subsection{Sims’ exogeneity test}
Sims suggested a simple \textbf{exogeneity test} that relies upon the causality test described above.\footnote{%
Sims proposed a slightly different causality test in which $Y_{t}$ is regressed on current, past, and future values of $X_{t}$. Sims suggested that $Y_{t}$ is a cause of $X_{t}$ if future $X_{t}$ are jointly significant in explaining current $Y_{t}$. The test is reversed to see if $X_{t}$ causes $Y_{t}$. Sims showed, that his causality test is equivalent to the test suggested by Granger.} A variable is said to be exogenous if it is a cause of one or more of the endogenous variables in the model, but is not caused by any of these endogenous variables. Suppose, for example, that an economist wishes to determine whether $X_{t}$ is an exogenous variable in a model containing a single endogenous variable $Y_{t}$. To test for exogeneity, two causality tests are performed. The first test is used to determine whether $X_{t}$ is a cause of $Y_{t}$. The second test is used to determine whether $Y_{t}$ is a cause of $X_{t}$. If these tests indicate that $X_{t}$ is a cause of $Y_{t}$, but $Y_{t}$ is not a cause of $X_{t}$ then $X_{t}$ is said to be exogenous.
The following procedure can be used to construct this exogeneity test: \begin{enumerate}
\item[Step 1:] Estimate the parameters of the model:
\begin{equation}
Y_t=\alpha _o+\alpha _1Y_{t-1}+\alpha _2Y_{t-2}+\cdots +\alpha _mY_{t-m} \label{ncaus.tst.sc}
\end{equation}
\begin{equation*}
+\beta _1X_{t-1}+\beta _2X_{t-2}+\cdots +\beta _mX_{t-m}+u_t \end{equation*}
using an OLS estimation procedure.
\item[Step 2:] Conduct a Wald test of the hypothesis:
\begin{equation*}
\text{H}_{o}\text{: }\beta _{1}=\beta _{2}=\cdots =\beta _{m}=0 \end{equation*}%
at an appropriate significance level. If the null hypothesis can be rejected, then $X$ is said to be a cause of $Y$. This suggests that $X$ belongs as an independent variable in an equation explaining the level of $Y$% .
\item[Step 3:] Estimate the parameters of the model:
\begin{equation}
X_t=\gamma _o+\gamma _1X_{t-1}+\gamma _2X_{t-2}+\cdots +\gamma _mX_{t-m} \label{ncaus1.tst.sc}
\end{equation}
\begin{equation*}
+\eta _1Y_{t-1}+\eta _2Y_{t-2}+\cdots +\eta _mY_{t-m}+v_t \end{equation*}
using an OLS estimation procedure.
\item[Step 4:] Conduct a Wald test of the hypothesis:
\begin{equation*}
\text{H}_o\text{: }\eta _1=\eta _2=\cdots =\eta _m=0
\end{equation*}
at the significance level selected in Step 2. If the null hypothesis cannot be rejected, then $Y$ is not a cause of $X$ (under the Granger-Sims definition of causality).
\item[Step 5:] Under Sims’ definition of exogeneity, $X$ is said to be exogenous if $X$ is a cause of $Y$, but $Y$ is not a cause of $X$. This condition is satisfied if the $\beta _{j}$’s are jointly significant in equation \ref{ncaus.tst.sc} and the $\eta _{j}$’s are jointly insignificant in equation \ref{ncaus1.tst.sc}. If these conditions are jointly satisfied, then it can be claimed that $X$ is an exogenous independent variable when included in a regression equation in which $Y$ is the dependent variable.
\end{enumerate}
Of course, since this exogeneity test is based upon the causality test described above, it is subject to the same shortcomings.
\subsection{Example:\ Consumer debt and recessions\label{causal_debt_sec}} In the 1990s, concern was frequently expressed by economic analysts about growth of consumer debt as a share of disposable personal income. There was some concern that the rising debt burden would lead to lower consumption spending and an end to the economic expansion occuring during this period.
Schmitt (2000) used several Granger causality tests to investigate the possible causal relationships that might exist between consumer debt and several indicators of macroeconomic activity. Two of the equations estimated were:%
\begin{equation}
\text{DSR}_{t}=a_{o}+\dsum\limits_{j=1}^{6}a_{j}\text{DSR}% _{t-j}+\dsum\limits_{j=1}^{6}b_{j}\text{GDP}_{t-j} \label{DSR_granger} \end{equation}%
\begin{equation}
\text{GDP}_{t}=\alpha _{o}+\dsum\limits_{j=1}^{6}\beta _{j}\text{GDP}% _{t-j}+\dsum\limits_{j=1}^{6}\gamma _{j}\text{DSR}_{t-j} \label{DSR_GDP} \end{equation}%
\begin{equation*}
\begin{array}{ll}
\text{where:} & \text{DSR}_{t}\text{ = ratio of consumer debt to disposable personal income} \\
& \text{GDP}_{t}\text{ = real GDP (measured in billions of chained 1997 dollars)}%
\end{array}%
\end{equation*}%
An $F$-test of the joint significance of the parameters $b_{j}$ $(j=1,\ldots ,6)$ may be used to test for causality from GDP to consumer debt. Causality from real GDP to consumer debt is indicated if this test indicates that these variables are jointly significant. An $F$-test of the joint signficance of the parameters $\gamma _{j}$ ($j=1,\ldots ,6$) is similarly used to test for causality from consumer debt to real GDP.
When the parameters of equations \ref{DSR_granger} and \ref{DSR_GDP} are estimated using an OLS estimation procedure, a\ Wald test of the joint significance of the $b_{j}$ coefficients has a p-value of 0.0006. The p-value is 0.90 for the Wald test of the joint significance of the $\gamma _{j}$ coefficients.\footnote{%
The $F$-statistic for the Wald test involving the $b_{j}$ coefficients is 4.51 while the $F$-statistic is 0.36 for the Wald test of the joint significance of the $\gamma _{j}$ coefficients. (At a 1\% significance level, the critical value for an $F$(6,75) is 3.06.)} These results suggest that, under the Granger-Sims definition of causality, GDP is a cause of consumer debt while consumer debt is not a cause of GDP. Using Sims’
definition of exogeneity, this suggests that GDP may be treated as an exogenous determinant of consumer debt in a model explaining consumer debt.
Similar results were found by Schmitt.
\subsection{Hausman test}
While the exogeneity test described above may be used to test for exogeneity in time-series models, it cannot be used to to test for this when cross-sectional data is used. An alternative test, developed by Hausman (1976), may be used for this purpose. To illustrate the use of this test, consider the following model:%
\begin{equation}
Y_{1t}=\beta _{o}+\beta _{1}Y_{2t}+\beta _{2}X_{1t}+u_{t} \label{y1_Haus} \end{equation}%
\begin{equation}
Y_{2t}=\gamma _{o}+\gamma _{1}Y_{1t}+\gamma _{2}X_{2t}+v_{t} \label{y2_Haus} \end{equation}%
Suppose that an econometrician wishes to estimate the parameters of equation % \ref{y1_Haus}, but suspects that $Y_{2t}$ is endogenous (through the relationship described in equation \ref{y2_Haus}). As noted above, OLS estimators of equation \ref{y1_Haus} will be biased if $Y_{2t}$ is an endogenous variable. The source of the bias is the correlation between $% Y_{2t}$ and $u_{t}$. If $Y_{2t}$ is exogenous, however, $Y_{2t}$ will be uncorrelated with $u_{t}$ and an OLS estimation technique is appropriate. A nonzero correlation between $Y_{2t}$ and $u_{t}$, however, suggests that $% Y_{2t}$ is endogenous. The \textbf{Hausman test}\footnote{% The Hausman test is sometimes referred to as the Hausman-Wu test, or the Durbin-Hausman-Wu test since this test is an extension of tests developed by Durbin (1954) and Wu (1973).} provides a test of this relationship.
A Hausman test of exogeneity proceeds in the following manner: \begin{enumerate}
\item[Step 1: ] Regress the suspected endogenous variable (or variables, if there is more than one suspected endogenous variable) on all of the exogenous variables in the equation system. Save the fitted endogenous variable(s). In this example, $Y_{2t}$ is regressed on $X_{1t}$ and $X_{2t}$ and $\hat{Y}_{2t}$ is saved.
\item[Step 2: ] Add the fitted endogenous variable as an additional regressor in the equation of interest and estimate the modifed equation by OLS. If there is more than one suspected endogenous variable, add fitted values of each of the suspected endogenous variables. In this case, the equation to be estimated is:%
\begin{equation*}
Y_{1t}=\beta _{o}+\beta _{1}Y_{2t}+\beta _{2}X_{1t}+\beta _{3}\hat{Y}% _{2t}+u_{t}
\end{equation*}
\item[Step 3: ] Perform a $t$-test on the estimated coefficent on the fitted variable. Under the Hausman test, the variable $Y_{2t}$ is found to be endogenous if this hypothesis test indicates that $\beta _{3}$ is significantly different from zero; if $\beta _{3}$ is not found to be significantly different from zero, the Hausman test indicates that $Y_{2t}$ may be treated as being exogenous. (In the case of more than one potential endogenous variable, an $F$-test is used to test the joint significance of the set of fitted variables. One or more of the variables is found to be endogenous if the $F$-statistic exceeds the appropriate critical value.) \end{enumerate}
\subsection{Example: Demand and Supply of Loans\label{haus_test_sec}} The model of the demand and supply of commercial loans discussed in section % \ref{D_S_loans} may be used to illustrate the application of the Hausman test. Suppose that a researcher wishes to estimate the demand equation given by:%
\begin{equation*}
\text{Loans}_{t}=\beta _{o}+\beta _{1}\text{Prime}_{t}+\beta _{2}\text{AAA}% _{t}+\beta _{3}\text{Indprod}_{t}+u_{t}
\end{equation*}%
A Hausman test may be used to examine whether it is appropriate to consider the Prime$_{t}$ as an exogenous variable in this equation. To perform this test:
\begin{enumerate}
\item[Step 1: ] Prime$_{t}$ is regressed against the exogenous variables in the system of equations (\ref{demand3.zz} and \ref{supply3.zz}). Equation~% \ref{prime_iv_est} contains the result of this estimation.
\item[Step 2: ] The following equation is estimated by OLS:% \begin{equation*}
\widehat{\text{Loans}}_{t}=\underset{(-27.59)}{-837.12}+\underset{(-1.05)}{% -4.48}\text{Prime}_{t}+\underset{(12.50)}{41.06}\text{AAA}_{t}+\underset{% (73.51)}{15.42}\text{Indprod}_{t}-\underset{(-3.19)}{15.08}\widehat{\text{% Prime}}_{t}
\end{equation*}%
\begin{equation*}
\text{(}t\text{-statistics in parentheses)}
\end{equation*}
\item[Step 3: ] Since the estimated $t$-statistic of -3.19 is significantly different from zero at a .01 level, the Hausman test indicates that Prime$% _{t}$ should be treated as an endogenous variable in the loan demand equation.
\end{enumerate}
\section{Autocorrelation in simultaneous equation models} Since many simultaneous equation models involve time-series applications, it is quite possible that autocorrelation may be present in one or more of the equations in the model. A test for the existence of autocorrelation may be performed using a variation of the Lagrange multiplier test discussed earlier.\footnote{%
See Woodbridge (1991) for a more detailed discussion of this procedure.} Suppose the relevant structural equation is:
\begin{equation}
Y_{1t}=\beta _o+\beta _1Y_{2t}+\beta _2X_t+u_t \label{stru.1.eq} \end{equation}
To perform a test for $p$-th order autocorrelation:\footnote{% Under a $p$th order autoregressive process, the error term, $u_t$ can be expressed as:
\begin{equation*}
u_t=\rho _1u_{t-1}+\rho _2u_{t-2}+\cdots +\rho _pu_{t-p}+\epsilon _t \end{equation*}
where $\epsilon _t$ is an error process possessing a zero mean and constant variance and is uncorrelated across observations.}
\begin{enumerate}
\item[Step 1:] Estimate the parameters of equation \ref{stru.1.eq} by 2SLS and save the estimated residuals, $\hat{u}_t$.
\item[Step 2:] Estimate the parameters of the following regression equation by OLS:
\begin{equation*}
\hat{u}_t=\gamma _o+\gamma _1\hat{Y}_{2t}+\gamma _2X_t+\gamma _3\hat{u}% _{t-1}+\gamma _4\hat{u}_{t-2}+\cdots +\gamma _{p+1}\hat{u}_{t-p}+\epsilon _t \end{equation*}
where $\hat{Y}_{2t}$ is the predicted value of $Y_{2t}$ from the first-stage regression on the set of all predetermined values in the model (\textit{i.e.,% } this is the predicted value from the estimated reduced-form equation for $% Y_{2t}$).
\item[Step 3:] Formulate the Lagrange multiplier statistic given by: \begin{equation*}
\text{Lagrange multiplier statistic = (}N-p)R^2
\end{equation*}
This statistic is distributed as a $\chi ^2$ statistic with $p$ degrees of freedom.
\end{enumerate}
If autocorrelation is found, the Prais-Winsten or Hildreth-Lu estimation procedures discussed in Chapter \ref{auto.chap} may be used to resolve the problem. These procedures, however, must be modified somewhat in a simultaneous equations framework. To see why this modification is needed, note that the quasi-differenced equations will contain lagged values of one or more right-hand side endogenous variables. The reduced-form equation for each of these lagged dependent variables is a function of the \emph{lagged} values of all of the predetermined variables in the model.
Thus, to correct for $p$-th order autocorrelation, the usual Prais-Winsten or Hildreth-Lu procedure is used, but the right-hand side endogenous variables are replaced with fitted values from a first-stage regression of each of these endogenous variables on the current and first $p$ lags of each of the predetermined variables in the model.\footnote{% A more detailed discussion appears in Fair (1970).}
\section{Other estimators\label{oe.sc}}
In many econometric applications, an economist may wish to estimate the parameters of a set of equations in which the error terms are correlated across equations. Suppose, for example, an economist wishes to use time-series data to estimate the parameters of the demand and supply curves for several different (and unrelated) goods. It is quite likely that there may be substantial correlation (either positive or negative) between the error terms in two or more of these equations at the same time periods. For example, random shocks such as changes in consumer expectations or foreign income may affect the error terms in several (or all) of the equations under discussion. Error terms in different equations at different time periods, however, are still expected to be uncorrelated. This type of \textbf{% contemporaneous correlation }is a common feature of many time-series models.
(Of course, a similar outcome can occur in cross-sectional or panel-data models as well.)
There are two major econometric models that are used when a system of equations exhibits contemporaneous correlation:
\begin{itemize}
\item the seemingly unrelated regressions (SUR) model;\footnote{% In some texts, this is abbreviated as SURE (Seemingly Unrelated Regressions Equations).} and
\item the three-stage least squares (3SLS) model.
\end{itemize}
Let’s examine these models.
\subsection{Seemingly Unrelated Regressions (SUR) model} The following two equations may be used to illustrate the existence of contemporaneous correlation:
\begin{equation}
Y_{1t}=\beta _o+\beta _1X_t+u_t \label{SURE.sc}
\end{equation}
\begin{equation}
Y_{2t}=\alpha _o+\alpha _1Z_t+v_t \label{SURE.2.sc}
\end{equation}
where:
\begin{equation}
E(u_t)=0 \label{c1.sc}
\end{equation}
\begin{equation}
E(v_t)=0 \label{c2.sc}
\end{equation}
\begin{equation}
E(u_t^2)=\sigma _u^2 \label{c3.sc}
\end{equation}
\begin{equation}
E(u_tu_s)=0\text{ for }t\neq s \label{c4.sc}
\end{equation}
\begin{equation}
E(v_t^2)=\sigma _v^2 \label{c5.sc}
\end{equation}
\begin{equation}
E(v_tv_s)=0\text{ for }t\neq s \label{c6.sc}
\end{equation}
\begin{equation}
E(u_tv_t)=\sigma _{uv} \label{c7.sc}
\end{equation}
\begin{equation}
E(u_tv_s)=0\text{ for }t\neq s \label{c8.sc}
\end{equation}
An inspection of equations \ref{SURE.sc} and \ref{SURE.2.sc} suggests that there is no obvious connection between the two equations. Neither equation contains any right-hand side endogenous variables. The only relationship between these equations shows up in equation \ref{c7.sc}: the error terms between these two equations are correlated at each point in time.
An equation system of this sort is called a \textbf{seemingly unrelated regressions (SUR) }model. An estimation procedure for seemingly unrelated regressions models has been devised by Zellner (1962). In general, the SUR estimator provides an efficiency gain over OLS estimation of the individual equations separately. There is no efficiency gain, however, if: \begin{itemize}
\item the same right-hand side variables appear in each and every equation in the model; or
\item the contemporaneous correlation between the error terms is zero.
\end{itemize}
Most modern econometric software packages contain estimators for SUR models.
A SUR estimation procedure may provide is relatively efficient when an econometrician is attempting to estimate the parameters of a system of equations in which:\footnote{%
An OLS estimation procedure, however, will still result in estimates that are unbiased and consistent.}
\begin{itemize}
\item each equation contains only predetermined variables in the right-hand side of the equation,
\item the error terms are expected to exhibit nonzero contemporaneous correlation, and
\item the independent variables are not identical in all equations.
\end{itemize}
\subsection{Example: Grunfeld investment equations}
In his dissertation, Grunfeld (1958) examined the determinants of investment spending at 10 large corporations over a 20-year period. For each company, he estimated a relationship of the form:
\begin{equation}
I_{t}\text{ = }\beta _{o}+\beta _{1}F_{t}+\beta _{2}C_{t}+u_{t} \label{grun.sc}
\end{equation}%
\begin{equation*}
\begin{array}{ll}
\text{where:} & I_{t}\text{ = real investment in year }t \\ & F_{t}\text{ = real market value of firm at start of year }t \\ & C_{t}\text{ = real quantity of capital of plant and equipment at start of year }t\text{ }%
\end{array}%
\end{equation*}%
In this equation, $F_{t}$ equals the real market value of the firm’s stock plus outstanding debt issued by the firm. This market value provides a measure of the expected profitability of the firm (since higher expected profits result in a higher market value for the firm’s stock). The variable $% C_{t}$ measures the size of the firm’s existing capital stock. It is anticipated that $\beta _{1}$ will be positive since higher expected profits should result in an increase in investment spending. Since replacement investment varies directly with the capital stock of the firm, it is also expected that $\beta _{2}$ will be positive.
Zellner (1960), in his classic paper on SUR models, noted that it is likely that the error terms in each equation will exhibit a positive contemporaneous covariance. Unobserved factors (captured by the error terms in these equations) that result in unusually high levels of investment for one firm will also be expected to result in unusually high levels of investment in other firms during the same time period. Thus, positive values for the error terms in one firm’s investment equation will tend to be associated with positive values for the error terms in other firms’
investment equations. Using the same argument, negative error terms in one firm’s investment equation will often be expected to coincide with negative error terms in other firms’ investment equations.
Consider, for example, the following investment equations for General Electric and Westinghouse:
\begin{equation}
I_t^{GE}\text{ = }\beta _o+\beta _1F_t^{GE}+\beta _2C_t^{GE}+u_t \label{grun.ge}
\end{equation}
\begin{equation}
I_t^W\text{ = }\gamma _o+\gamma _1F_t^W+\gamma _2C_t^W+v_t \label{grun.w} \end{equation}
If the error terms $u_t$ and $v_t$ have a nonzero contemporaneous correlation ($E(u_tv_t\neq 0)$), then a SUR estimation procedure will provide an efficiency gain over an OLS estimation procedure in estimating the parameters of equations \ref{grun.ge} and \ref{grun.w}.
Estimating the parameters of equations \ref{grun.ge} and \ref{grun.w} using an OLS estimation procedure (applied to each equation separately) resulted in:
\begin{equation}
\widehat{I_t^{GE}}\text{ = }-\underset{(31.374)}{9.956}+\underset{(0.01557)}{% 0.0266}F_t^{GE}+\underset{(0.0257)}{0.152}C_t^{GE} \label{grun.ge.ols} \end{equation}
\begin{equation}
\widehat{I_t^W}\text{ = }-\underset{(8.015)}{0.509}+\underset{(0.01571)}{% 0.01571}F_t^W+\underset{(0.0561)}{0.9241}C_t^W \label{grun.w.ols} \end{equation}
\begin{equation*}
\text{(standard errors in parentheses)}
\end{equation*}
When a SUR\ estimation procedure is used to estimate the parameters of both equations jointly, the following equations result:
\begin{equation}
\widehat{I_t^{GE}}\text{ = }-\underset{(27.313)}{30.611}+\underset{(0.01339)}% {0.0404}F_t^{GE}+\underset{(0.0235)}{0.136}C_t^{GE} \label{grun.ge.sure} \end{equation}
\begin{equation}
\widehat{I_t^W}\text{ = }-\underset{(6.926)}{1.681}+\underset{(0.01329)}{% 0.0593}F_t^W+\underset{(0.04874)}{0.0561}C_t^W \label{grun.w.sure} \end{equation}
\begin{equation*}
\text{(standard errors in parentheses)}
\end{equation*}
The efficiency gain resulting from the application of the SUR estimation procedure can be seen by comparing the standard errors for the intercept and slope parameters for each equation under the two estimation procedures. The SUR estimation procedure results in lower estimated standard errors for each intercept and slope parameter.
\subsection{Three-stage least squares (3SLS)\label{3sls.sec.sc}} Contemporaneous correlation may also occur in simultaneous equations systems. Fortunately, however, a technique known as \textbf{three-stage least squares (3SLS)} exists for such models.\footnote{% This technique was developed by Zellner and Theil (1962). A full discussion of this technique requires mathematical tools that are beyond the scope of this text. Readers comfortable with the use of matrix algebra may wish to examine the discussion provided by Zellner and Theil.} This three-stage estimator consists of the following steps:
\begin{enumerate}
\item[Step 1:] Estimate the reduced-form equations for all of the right-hand side endogenous variables in each equation.
\item[Step 2:] Estimate the parameters of the structural model for each equation after replacing each right-hand side endogenous variable with its fitted values from Step~1. (Note that steps one and two simply involve estimating each equation by 2SLS).
\item[Step 3:] Apply a seemingly unrelated regressions estimator to the equations estimated in Step~2.
\end{enumerate}
The 3SLS\ estimator will, in general, provide an efficiency gain over 2SLS.
This efficiency gain, however, will only occur if there is nonzero contemporaneous correlation between the residuals in two or more of the equations in the model.\footnote{%
The 2SLS\ estimators, however, are still consistent in this case.} \subsection{Example:\ demand and supply of loans}
The correlation between the error terms in the demand and supply equations of loans discussed in section \ref{D_S_loans} is 0.04\ when this model is estimated using a 2SLS estimation procedure. While this correlation is not very large, a nonzero correlation suggests that random shocks that affect the demand for loans also affect the supply of loans. In this case, an efficiency gain may occur when the demand and supply equations are estimated jointly using a 3SLS estimation procedure. Table \ref{dands_loans_table}\ compares the estimated demand and supply equation estimates under OLS, 2SLS, and 3SLS estimation procedures.%
%TCIMACRO{\TeXButton{B}{\begin{table}[tbp] \centering}}% %BeginExpansion
\begin{table}[tbp] \centering%
%EndExpansion
\begin{tabular}{llll}
\hline
\textbf{Variable} & $\underset{\text{(}t\text{-ratio)}}{\text{\textbf{OLS estimate}}}$ & $\underset{\text{(}t\text{-ratio)}}{\text{\textbf{2SLS estimate}}}$ & $\underset{\text{(}t\text{-ratio)}}{\text{\textbf{3SLS estimate}}}$ \\ \hline
\multicolumn{4}{l}{\textbf{Demand equation (dependent variable = Loans}$_{t}$% \textbf{)}} \\
constant & \multicolumn{1}{r}{$\underset{(-27.07)}{-824.63}$} & \multicolumn{1}{r}{$\underset{(-27.17)}{-837.12}$} & \multicolumn{1}{r}{$% \underset{(-27.23)}{-834.31}$} \\
Prime$_{t}$ & \multicolumn{1}{r}{$\underset{(-8.97)}{-16.73}$} & \multicolumn{1}{r}{$\underset{(-9.42)}{-19.55}$} & \multicolumn{1}{r}{$% \underset{(-9.36)}{-19.31}$} \\
AAA$_{t}$ & \multicolumn{1}{r}{$\underset{(12.02)}{37.28}$} & \multicolumn{1}{r}{$\underset{(12.31)}{41.06}$} & \multicolumn{1}{r}{$% \underset{(12.24)}{40.59}$} \\
Indprod$_{t}$ & \multicolumn{1}{r}{$\underset{(72.61)}{15.36}$} & \multicolumn{1}{r}{$\underset{(72.37)}{15.42}$} & \multicolumn{1}{r}{$% \underset{(72.72)}{15.41}$} \\
\multicolumn{4}{l}{\textbf{Supply equation (dependent variable = Prime}$_{t}$% \textbf{)}} \\
constant & \multicolumn{1}{r}{$\underset{(1.48)}{0.30}$} & \multicolumn{1}{r}{$\underset{(2.12)}{0.46}$} & \multicolumn{1}{r}{$\underset% {(2.37)}{0.51}$} \\
Loans$_{t}$ & \multicolumn{1}{r}{$\underset{(-0.06)}{-0.000409}$} & \multicolumn{1}{r}{$\underset{(1.82)}{0.0023}$} & \multicolumn{1}{r}{$% \underset{(2.29)}{0.0028}$} \\
TBills3$_{t}$ & \multicolumn{1}{r}{$\underset{(62.80)}{1.20}$} & \multicolumn{1}{r}{$\underset{(51.34)}{1.17}$} & \multicolumn{1}{r}{$% \underset{(51.35)}{1.16}$} \\
Deposits$_{t}$ & \multicolumn{1}{r}{$\underset{(3.51)}{0.00063}$} & \multicolumn{1}{r}{$\underset{(0.00)}{0.0000017}$} & \multicolumn{1}{r}{$% \underset{(-0.40)}{-0.00014}$} \\ \hline
\end{tabular}%
\caption{OLS, 2SLS, and 3SLS estimates of the demand and supply of loans equations\label{dands_loans_table}}%
%TCIMACRO{\TeXButton{E}{\end{table}}}%
%BeginExpansion
\end{table}%
%EndExpansion
In this case, the OLS, 2SLS, and 3SLS estimators generate remarkably similar estimates of the demand for loans equation. In the supply equation, however, the Loans$_{t}$ variable is not statistically significant under the OLS estimator at any conventional significance level, but is significantly greater than zero at \ a 10\% significance level in the 2SLS equation and at a 5\% significance levels in the 3SLS equation.
\section{Summary}
In many econometric models, endogenous variables appear on the right-hand side of one or more equations. OLS\ estimation techniques will generally result in biased and inconsistent estimates in these cases. While an OLS estimation procedure may be used to estimate the parameters of a reduced-form version of the model, it is often preferable to estimate the original structural equations of the model since these structural equations often embody hypotheses concerning the behavior of economic agents. The two-stage least squares (2SLS) estimation procedure described in this chapter makes it possible to obtain consistent estimates of all model parameters. This 2SLS estimator, however, may only be used to estimate equations that are identified.
The structural equations of an econometric model can be transformed into a reduced-form model in which each endogenous variable is expressed solely in terms of predetermined variables. The estimated parameters of these reduced-form equations may be used for comparative statics purposes. If the system of equations is identified, it is possible to generate estimates of the structural parameters from the reduced-form parameter estimates. This estimation method is referred to as indirect least squares.
Quite often economists wish to estimate the parameters of a system of equations in which the error terms are contemporaneously correlated. If the right-hand side variables in each of these equations are all predetermined then a SUR (Seemingly Unrelated Regressions) estimation procedure may result in estimators that are more efficient than OLS estimators. The 3SLS (three-stage least squares) estimator provides a similar efficiency gain when the errors in a simultaneous equations model are contemporaneously correlated.
\section{Key Concepts}
endogenous variable
exogenous variable
structural equations
structural parameters
simultaneous equations bias
instrumental variable estimator
two-stage least squares (2SLS)
predetermined variable
identification
underidentified equation
just identified equation
overidentified equation
reduced-form equations
reduced-form parameters
comparative static analysis
indirect least squares
causality test
exogeneity test
contemporaneous correlation
Hausman test
seemingly unrelated regressions (SUR) three-stage least squares (3SLS)
\newpage\
\section{Exercises and problems}
\begin{enumerate}
\item Consider the following simultaneous equations model: \begin{equation*}
Y_{1i}=\beta _o+\beta _1Y_{2i}+\beta _3X_i+u_i \end{equation*}
\begin{equation*}
Y_{2i}=\gamma _o+\gamma _1Y_{1i}+\gamma _2X_i+v_i \end{equation*}
\begin{enumerate}
\item Which of these equations is identified?
\item Can the parameters of each of these structural equations be estimated using 2SLS? Explain.
\end{enumerate}
\item Explain why the 2SLS estimation procedure requires that at least one predetermined variable must be excluded for each right-hand side endogenous variable that appears in a regression equation.
\item
\begin{enumerate}
\item Determine the reduced-form equation for the consumption function appearing in the model described by equations \ref{keynes.0.lc}, \ref% {keynes.0.lca}, and \ref{keynes.0.lcb}.
\item Provide an economic interpretation of each of the reduced-form slope parameters.
\item Estimate the parameters of this reduced-form equation using the data in Table \ref{gdp.dat} in Appendix \ref{data.appendix} (or in the file “gdp.dat”).
\end{enumerate}
\item
\begin{enumerate}
\item Use the data in Table \ref{gdp.dat} in Appendix \ref{data.appendix} to verify the OLS estimates appearing in equation \ref{c.ols.sc}.
\item Use a 2SLS estimation procedure to verify the results reported in equation \ref{c.2sls.sc}. If your econometrics software package generates 2SLS estimates without reporting the results of the first-stage regression, estimate the first-stage regression of $YD_{t}$ on a constant term, $I_{t}$, $G_{t}$, $T_{t}$ and $NX_{t}$.
\item What is the $R^2$ from the first-stage regression in part (b)? Can this result account for the remarkable similarity between the estimates in equations \ref{c.ols.sc} and \ref{c.2sls.sc}?
\item Are the values of $I_t$, $G_t$, $T_t$ and $NX_t$ really exogenous? In a more complete macroeconomic model, which of these variables is likely to be endogenous and which are likely to be exogenous? Explain.
\item Reestimate the consumption function using an appropriate 2SLS estimation procedure using the set of exogenous instruments from part (d).
Is your estimated consumption function substantially different from that the 2SLS model appearing in the text?
\end{enumerate}
\item If a simultaneous equation problem occurs when an endogenous variable is used as a regressor, why can lagged values of the endogenous variable be used as regressors without this problem?
\item Suppose an econometrician wishes to estimate an equation explaining the wages of individuals using cross-sectional data. One factor that affects the wage is the number of hours a worker works each week.
\begin{enumerate}
\item If the wage rate is regressed on average weekly hours (and other appropriate variables), why might a simultaneity bias problem occur? Explain.
\item Discuss how this issue could be addressed.
\end{enumerate}
\item Use the data in Table \ref{gdp.dat} in Appendix \ref{data.appendix} to estimate the equation:%
\begin{equation*}
C_{t}=\beta _{o}+\beta _{1}YD_{t}+\beta _{2}\text{WWII}_{t}\text{+ }u_{t} \end{equation*}%
using both OLS\ and 2SLS. (use the same instruments that were used to estimate equation \ref{c.2sls.sc}. Interpret these results.
\item
\begin{enumerate}
\item Construct a demand and supply model that explains the equilibrium price and quantity of coffee over time.
\item What are the endogenous variables in your model? What are the exogenous variables?
\item Are your demand and supply equations identified?
\end{enumerate}
\item
\begin{enumerate}
\item Construct a demand and supply model that explains the equilibrium price of physicists.
\item What are the endogenous variables in your model? What are the exogenous variables?
\item Is your model identified?
\end{enumerate}
\item In a study of energy demand in Greece, Donatos and Mergos (1988) estimate energy demand and supply :functions of the form:% \begin{equation}
\text{Demand relationship: ln(}Q_{t})=\beta _{o}+\beta _{1}\ln (P_{t})+\beta _{2}\ln (Y_{t})+\beta _{3}\ln (Q_{t-1})+u_{t} \label{demand4.zz} \end{equation}%
\begin{equation}
\text{Supply relationship: ln(}P_{t})=\gamma _{o}+\gamma _{1}\ln (Q_{t})+\gamma _{2}\ln (PA_{t})+v_{t} \label{supply4.zz} \end{equation}%
\begin{equation*}
\begin{array}{ll}
\text{where: } & Q_{t}\text{ = quantity of energy produced in period }i \\ & P_{t}=\text{price of energy} \\
& Y_{t}=\text{ Gross domestic product} \\ & PA_{t}=\text{ wholesale price index}% \end{array}%
\end{equation*}
\begin{enumerate}
\item Which of the variables in this model are predetermined?
\item Is the demand equation underidentified, just identified, or overidentified? Explain.
\item Is the supply equation underidentified, just identified, or overidentified?\ Explain..
\end{enumerate}
\item Consider the reduced-form version of the national income equation (appearing in equation \ref{rf.keynes.lca}).
\begin{enumerate}
\item Use the data in Table \ref{gdp.dat} in Appendix \ref{data.appendix} (or in the file \textquotedblleft gdp.dat\textquotedblright ) to estimate the parameters of this equation.
\item Compare the estimates of the investment and government spending multipliers in this model. Does this model predict that these will be the same?
\item Use a Wald test to determine whether there is a statistically significant difference between the government spending and investment multipliers (at a 5\% significance level).
\end{enumerate}
\item Early studies of the economic effects of the corporate income tax conducted by Musgrave and Krzyaniak (1963) and Gordon (1967) attempted to examine the effect of changes in the corporate income tax rate on the level of corporate profits using a reduced-form corporate profits equation. Sebold (1979) suggested that a major shortcoming of this approach is that it \textquotedblleft yields no information with regard to the tax-effects on the decision variables of the firms in question.\textquotedblright\ (Sebold, 1969, p. 401)
\begin{enumerate}
\item Comment on this argument. When might the estimates of structural parameters be preferred to the estimation of one or more reduced-form equations?
\item What questions might be answered by the estimation of a reduced-form profit equation? What questions can be answered by the estimation of structural equations?
\end{enumerate}
\item Use the data in the file \textquotedblleft loans.dat\textquotedblright\ to verify the OLS, 2SLS, and 3SLS estimates appearing in Table~\ref{dands_loans_table} on p.~\pageref{dands_loans_table}% . (This data is described in Table~\ref{loans.dat} on p.~\pageref{loans.dat}% .)
\item Use the data in the file \textquotedblleft loans.dat\textquotedblright\ to replicate the Hausman test performed in section \ref{haus_test_sec}. (This data is described in Table~\ref{loans.dat} on p.~\pageref{loans.dat}.)
\item Use a Hausman test determine whether the Loans$_{t}$ variable is endogenous in equation \ref{supply3.zz}. The data for this estimation may be found in the file \textquotedblleft loans.dat.\textquotedblright\ (This data is described in Table~\ref{loans.dat} on p.~\pageref{loans.dat}.) \item The loan supply equation (equation \ref{supply3.zz}) may also be specified in a more traditional form as: \begin{equation*}
\text{Loan}_{t}=\gamma _{o}+\gamma _{1}\text{Prime}_{t}+\gamma _{2}\text{% Tbill3}_{t}+\gamma _{3}\text{Deposits}_{t}+v_{t} \end{equation*}
\begin{enumerate}
\item Estimate this loan supply equation by OLS.
\item Use a 2SLS estimation procedure to estimate this supply equation (Assume that the demand equation is given by equation \ref{demand3.zz}).
Compare and contrast the OLS and 2SLS estimates.
\end{enumerate}
\item Dee (1998) examines the effect of competition from private schools on public school quality. He indicates that many prior studies were flawed because the \textquotedblleft demand for private schooling is not an independent determinant of the quality of local public schools.\textquotedblright\ (Dee (1998), p. 419). He suggests that parameter estimates are likely to be biased when a measure of the quality of local public schools is regressed against a measure of the quality of local private schools using an OLS procedure. Explain why there might be a simultaneous equation bias in this case. (Hint:\ If there are high-quality private schools in an area, how might this affect the demand for quality at the local public school?\textquotedblright ) \item One of the first applications of causality tests involved testing for causality between changes in the money supply and the level of nominal GNP.% \label{mon.inc.sc}
\begin{enumerate}
\item Perform a causality test to examine whether the nominal money supply ($% M2$) is a cause of nominal GDP using the data in the file \textquotedblleft money2.dat\textquotedblright\ (this data is described in Table \ref% {money2.dat} in Appendix \ref{data.appendix}). Use 4 lags of each variable in the unrestricted model.
\item Perform a test to determine whether nominal GDP is a cause of the nominal money supply. Again, use four lags of each variable in the unrestricted model.
\item Do these results suggest that the money supply is exogenous? Explain.
\item Repeat question \ref{mon.inc.sc} using an 8-lag model. Do any of the results change substantially?
\end{enumerate}
\item
\begin{enumerate}
\item Replicate the Granger-Sims causality test (described in section\ \ref% {causal_debt_sec}) involving consumer debt and real GDP. Use the data in the file \textquotedblleft debt.dat.\textquotedblright\ (This data is described in Table~\ref{debt.dat} on p.~\pageref{debt.dat}.) \item Change the lag length to 4 lags. Do the results of the causality test change?
\end{enumerate}
\item Use the \textquotedblleft grunfeld.dat\textquotedblright\ data file (described in Table \ref{grunfeld.dat} on p.~\pageref{grunfeld.dat}) to verify the estimates appearing in equations \ref{grun.ge.ols} and \ref% {grun.w.ols}. Save the estimated residuals from each equation and determine the sample correlation between these two residuals. What does this correlation tell you about the relationship existing between these error terms.
\begin{enumerate}
\item Use a SUR estimation procedure to verify the results appearing in equations \ref{grun.ge.sure} and \ref{grun.w.sure}.
\end{enumerate}
\item
\begin{enumerate}
\item Use an OLS estimation procedure to estimate the parameters of the equations:
\begin{equation*}
I_{t}^{GM}=\beta _{o}+\beta _{1}F_{t}^{GM}+\beta _{2}C_{t}^{GM}+u_{t} \end{equation*}%
and
\begin{equation*}
I_{t}^{C}=\gamma _{o}+\gamma _{1}F_{t}^{C}+\gamma _{2}C_{t}^{C}+v_{t} \end{equation*}%
(Use the data in the file \textquotedblleft grunfeld.dat\textquotedblright\ (described in Table \ref{grunfeld.dat} in Appendix \ref{data.appendix}).
Save the estimated residuals from each equation and determine the sample correlation between these two residuals. What does this correlation tell you about the relationship existing between these error terms?
\item Estimate the parameters of these equations using a SUR estimation procedure.
\item Estimate the parameters of a set of investment equations for all four firms included in this data set using a SUR estimation procedure. Compare the standard errors from this model with those appearing in your estimates in (b) and those reported in equations \ref{grun.ge.sure} and \ref% {grun.w.sure}. Does the addition of two additional equations improve the \textquotedblleft fit\textquotedblright\ of the model?
\end{enumerate}
\item Consider the two-equation SUR model given by: \begin{equation*}
Y_{1t}=\beta _{o}+\beta _{1}X_{t}+u_{t} \end{equation*}%
and
\begin{equation*}
Y_{2t}=\gamma _{o}+\gamma _{1}Z_{t}+v_{t} \end{equation*}%
In a two-equation model, one method of constructing a SUR estimator is to:% \footnote{%
See the discussion in Conniffe (1982) for a discussion of the properties of this version of the SUR estimator.}
\begin{itemize}
\item estimate each equation by OLS and store the residuals from each equation, then
\item reestimate each equation by OLS, including the fitted residual from each equation as a regressor in the other equation. In mathematical terms, the 2nd stage equations are:
\begin{equation*}
Y_{1t}=\beta _{o}+\beta _{1}X_{t}+\beta _{2}\hat{v}_{t}+u_{t} \end{equation*}%
and
\begin{equation*}
Y_{2t}=\gamma _{o}+\gamma _{1}Z_{t}+\gamma _{2}\hat{u}_{t}+v_{t} \end{equation*}%
where $\hat{u}_{t}$ and $\hat{v}_{t}$ are the fitted error terms from the first-stage estimates.
\end{itemize}
\begin{enumerate}
\item What is being captured by the addition of the fitted error terms $\hat{% u}_{t}$ and $\hat{v}_{t}$ in the 2nd stage estimates?
\item If contemporaneous correlation exists, why would the second-stage estimates provide an efficiency gain over the first-stage OLS estimates?
\end{enumerate}
\end{enumerate}
\newpage\
\section{Mathematical appendix}
\subsection{Simultaneous equations bias} In the demand and supply model discussed in the text, the demand and supply equations are given by:
\begin{equation} \label{demand.1a.sc} \text{Demand relationship: }Q_t=\beta _o+\beta _1P_t+u_t \end{equation}
and
\begin{equation} \label{supply.1a.sc} \text{Supply relationship: }P_t=\gamma _o+\gamma _1Q_t+v_t \end{equation}
As noted in the main body of this chapter, the equilibrium price and quantity are:
\begin{equation}
P_t=\frac{\gamma _o+\gamma _1\beta _o}{1-\beta _1\gamma _1}+\frac{\gamma _1u_t+v_t}{1-\beta _1\gamma _1} \label{P.eq1.sc} \end{equation}
\begin{equation}
Q_t=\frac{\beta _o+\beta _1\gamma _o}{1-\beta _1\gamma _1}+\frac{u_t+\beta _1v_t}{1-\beta _1\gamma _1} \label{Q.eq1.sc} \end{equation}
The OLS\ estimate of $\beta _1$ is given by: \begin{equation*}
\hat{\beta}_1=\frac{\sum \left( P_t-\overline{P}\right) \left( Q_t-\overline{% Q}\right) }{\sum \left( P_t-\overline{P}\right) ^2} \end{equation*}
Using a property of summations (Property 8 as derived in the mathematical appendix at the end of Chapter \ref{stat.chap}), this can be restated as: \begin{equation}
\hat{\beta}_1=\frac{\sum \left( P_t-\overline{P}\right) Q_t}{\sum \left( P_t-% \overline{P}\right) ^2} \label{slope.bias.sc} \end{equation}
Using the demand relationship, this formula can be restated as: \begin{equation*}
\hat{\beta}_1=\frac{\sum \left( P_t-\overline{P}\right) \left( \beta _o+\beta _1P_t+u_t\right) }{\sum \left( P_t-\overline{P}\right) ^2} \end{equation*}
\begin{equation*}
=\frac{\beta _o\sum \left( P_t-\overline{P}\right) }{\sum \left( P_t-% \overline{P}\right) ^2}+\beta _1\frac{\sum P_t\left( P_t-\overline{P}\right) }{\sum \left( P_t-\overline{P}\right) ^2}+\frac{\sum \left( P_t-\overline{P}% \right) u_t}{\sum \left( P_t-\overline{P}\right) ^2} \end{equation*}
Since $\sum \left( P_t-\overline{P}\right) =0$ and $\sum P_t\left( P_t-% \overline{P}\right) =\sum \left( P_t-\overline{P}\right) ^2$, this reduces to:
%TCIMACRO{%
%\TeXButton{footnote}{\footnote{These results are derived in the mathematical appendix appearing at the %end of Chapter 2 (Properties 6 and 8 of summations).}} }% %BeginExpansion
\footnote{These results are derived in the mathematical appendix appearing at the end of Chapter 2 (Properties 6 and 8 of summations).} %EndExpansion
\begin{equation*}
\hat{\beta}_1=\beta _1+\frac{\sum \left( P_t-\overline{P}\right) u_t}{\sum \left( P_t-\overline{P}\right) ^2}
\end{equation*}
Thus, the expected value of the slope estimator is: \begin{equation*}
E(\hat{\beta}_1)=\beta _1+E\left[ \frac{\sum \left( P_t-\overline{P}\right) u_t}{\sum \left( P_t-\overline{P}\right) ^2}\right] \end{equation*}
Since $P_t$ is partly determined by the level of $u_t$ this last term will not, in general, equal zero. Thus, the OLS estimator is biased.
\subsection{Inconsistency of OLS estimates in simultaneous equations models} In the previous section, it was shown that OLS\ estimates will result in biased estimates when there is an endogenous variable on the right-hand side of the equation. It is also important to note that the OLS estimates are also inconsistent. This can be easily shown using the example above. As noted above, when an OLS estimation procedure is used to estimate the parameters of the demand relationship appearing in equation \ref% {demand.1a.sc}, the estimated slope coefficient is: \begin{equation}
\hat{\beta}_1=\beta _1+\frac{\sum \left( P_t-\overline{P}\right) u_t}{\sum \left( P_t-\overline{P}\right) ^2} \label{plk.sc} \end{equation}
Using the equilibrium price relationship appearing in equation \ref{P.eq1.sc}% , we have:
\begin{equation*}
\overline{P}=\frac{\gamma _o+\gamma _1\beta _o}{1-\beta _1\gamma _1}+\frac{% \gamma _1\overline{u}+\overline{v}}{1-\beta _1\gamma _1} \end{equation*}
Thus,
\begin{equation}
P_t-\overline{P}=\frac{\gamma _1\left( u_t-\overline{u}\right) +\left( v_t-% \overline{v}\right) }{1-\beta _1\gamma _1} \label{plk1.sc} \end{equation}
Using this relationship in \ref{plk1.sc}, equation \ref{plk.sc} can be restated as:
\begin{equation*}
\hat{\beta}_1=\beta _1+\frac{\left( 1-\beta _1\gamma _1\right) \sum \left[ \gamma _1\left( u_t-\overline{u}\right) +\left( v_t-\overline{v}\right) % \right] u_t}{\sum \left[ \gamma _1\left( u_t-\overline{u}\right) +\left( v_t-% \overline{v}\right) \right] ^2}
\end{equation*}
Defining $N$ as the sample size, this can be restated as: \begin{equation*}
\hat{\beta}_1=\beta _1+\frac{\left( 1-\beta _1\gamma _1\right) \left( 1/N\right) \sum \left[ \gamma _1\left( u_t-\overline{u}\right) +\left( v_t-% \overline{v}\right) \right] u_t}{\left( 1/N\right) \sum \left[ \gamma _1\left( u_t-\overline{u}\right) +\left( v_t-\overline{v}\right) \right] ^2} \end{equation*}
Under the conditions assumed above, the following limits exist: \begin{equation*}
\underset{N\rightarrow \infty }{\lim }\left( 1/N\right) \sum \left( u_t-% \overline{u}\right) ^2=\underset{N\rightarrow \infty }{\lim }\left( 1/N\right) \sum \left( u_t-\overline{u}\right) u_t \end{equation*}
\begin{equation*}
=\sigma _u^2
\end{equation*}
\begin{equation*}
\underset{N\rightarrow \infty }{\lim }\left( 1/N\right) \sum \left( v_t-% \overline{v}\right) ^2=\underset{N\rightarrow \infty }{\lim }\left( 1/N\right) \sum \left( v_t-\overline{v}\right) v_t \end{equation*}
\begin{equation*}
=\sigma _v^2
\end{equation*}
and
\begin{equation*}
\underset{N\rightarrow \infty }{\lim }\left( 1/N\right) \sum \left( v_t-% \overline{v}\right) u_t=0
\end{equation*}
Since the limit of a ratio equals the ratio of the limits, as the sample size approaches infinity, the slope estimator will converge to: \begin{equation*}
\underset{N\rightarrow \infty }{\lim }\hat{\beta}_1=\beta _1+\frac{\left( 1-\beta _1\gamma _1\right) \left( \gamma _1\sigma _u^2\right) }{\gamma _1^2\sigma _u^2+\sigma _v^2}
\end{equation*}
Note that the second term in this summation will not generally equal zero.
Thus, the OLS estimator is inconsistent.
\subsection{Correction of standard errors in 2SLS model} Suppose that the original structural equation is given by: \begin{equation*}
Y_{1i}=\beta _o+\beta _1Y_{2i}+\beta _2X_i+u_i \end{equation*}
In this model, it is assumed that the variables $Y_1$ and $Y_2$ are endogenous variables while $X$ is a predetermined variable. Assuming that this equation is identified, the second-stage of the 2SLS procedure involves the OLS estimation of the parameters of the equation: \begin{equation*}
Y_{1i}=\beta _o+\beta _1\hat Y_{2i}+\beta _2X_i+v_i \end{equation*}
The estimated variance of the residual in this second-stage equation is computed as:
\begin{equation} \label{corr.sc}
\hat \sigma _v^2=\frac{\sum \hat v_i}{N-3}=\frac{\sum \left( Y_{1i}-\beta _o-\beta _1\hat Y_{2i}-\beta _2X_i\right) }{N-3} \end{equation}
The unbiased estimate of the variance, however, is given by: \begin{equation} \label{corr.1.sc}
\hat \sigma _u^2=\frac{\sum \hat u_i}{N-3}=\frac{\sum \left( Y_{1i}-\beta _o-\beta _1Y_{2i}-\beta _2X_i\right) }{N-3} \end{equation}
Each of the OLS\ standard errors in the second-stage regression are computed using the inappropriate estimate of the variance appearing in equation \ref% {corr.sc}. To correct these estimates, compute: \begin{equation*}
\hat{\sigma}_u=\sqrt{\hat{\sigma}_u^2} \end{equation*}
and
\begin{equation*}
\hat{\sigma}_v=\sqrt{\hat{\sigma}_v^2} \end{equation*}
(Most regression packages report $\hat{\sigma}_v$, so this does not generally have to be computed by the user.) Form the ratio: \begin{equation*}
\frac{\hat{\sigma}_u}{\hat{\sigma}_v} \end{equation*}
To provide consistent estimates of the standard errors, multiply each of the OLS standard errors by this adjustment term.
\subsection{Indirect least squares}
An indirect least squares estimation procedure involves estimating the reduced-form version of a simultaneous equations model and using the resultant parameter estimates to generate estimates of the structural parameters. The remainder of this appendix examines the potential for applying this technique under conditions of exactly identified, overidentified and underidentified equations.
\subsubsection{Case I: Just identified equations} Let’s reconsider the demand and supply model discussed in Section \ref% {ds.sec.sc}. The demand and supply equations in this model are given by: \begin{equation} \label{demand.x.sc} \text{Demand relationship: }Q_t=\beta _o+\beta _1P_t+\beta _2X_t+u_t \end{equation}
\begin{equation} \label{supply.x.sc} \text{Supply relationship: }P_t=\gamma _o+\gamma _1Q_t+\gamma _2Z_t+v_t \end{equation}
As noted above, each of these equations contains one endogenous variable and excludes one exogenous variable. Thus, using the identification condition discussed in section \ref{just.id.sc}, each equation is just identified. The reduced-form equations for this model can be expressed as: \begin{equation} \label{j.id.rf.sc} Q_t=\pi _o+\pi _1X_t+\pi _2Z_t+\epsilon _{1t} \end{equation}
and
\begin{equation} \label{j.id.rf1.sc} P_t=\pi _3+\pi _4X_t+\pi _5Z_t+\epsilon _{2t} \end{equation}
where the reduced-form parameters are defined as: \begin{equation} \label{j.id.1.sc}
\pi _o=\frac{\beta _o+\beta _1\gamma _o}{1-\gamma _1\beta _1} \end{equation}
\begin{equation} \label{j.id.2.sc}
\pi _1=\frac{\beta _1\gamma _2}{1-\gamma _1\beta _1} \end{equation}
\begin{equation} \label{j.id.3.sc}
\pi _2=\frac{\beta _2}{1-\gamma _1\beta _1} \end{equation}
\begin{equation} \label{j.id.4.sc}
\pi _3=\frac{\gamma _o+\gamma _1\beta _o}{1-\gamma _1\beta _1} \end{equation}
\begin{equation} \label{j.id.5.sc}
\pi _4=\frac{\gamma _1\beta _2}{1-\gamma _1\beta _1} \end{equation}
\begin{equation} \label{j.id.6.sc}
\pi _5=\frac{\gamma _2}{1-\gamma _1\beta _1} \end{equation}
Equations \ref{j.id.1.sc} through \ref{j.id.6.sc} provide six equations that may be solved for the six variables $\beta _{o},\beta _{1},\beta _{2},\gamma _{1},\gamma _{2}$, and $\gamma _{3}$. After the application of simple (although somewhat tedious) algebraic transformations, these solutions are: \begin{equation}
\beta _{o}=\frac{\pi _{o}\pi _{2}\pi _{5}^{2}-\pi _{o}\pi _{1}\pi _{4}\pi _{5}-\pi _{1}\pi _{2}\pi _{3}\pi _{5}+\pi _{4}\pi _{1}^{2}\pi _{3}}{\pi _{2}\pi _{5}^{2}-\pi _{1}\pi _{4}\pi _{5}} \label{jsol.1.sc} \end{equation}%
\begin{equation}
\beta _{1}=\frac{\pi _{1}}{\pi _{5}} \label{jsol.2.sc} \end{equation}%
\begin{equation}
\beta _{2}=\pi _{2}-\frac{\pi _{4}\pi _{1}}{\pi _{5}} \label{jsol.3.sc} \end{equation}%
\begin{equation}
\gamma _{o}=\frac{\pi _{2}^{2}\pi _{3}\pi _{5}-\pi _{1}\pi _{2}\pi _{3}\pi _{4}-\pi _{o}\pi _{2}\pi _{4}\pi _{5}+\pi _{o}\pi _{1}\pi _{4}^{2}}{\pi _{2}^{2}\pi _{5}-\pi _{1}\pi _{2}\pi _{4}} \label{jsol.4.sc} \end{equation}%
\begin{equation}
\gamma _{1}=\frac{\pi _{4}}{\pi _{2}} \label{jsol.5.sc} \end{equation}%
\begin{equation}
\gamma _{2}=\pi _{5}-\frac{\pi _{4}\pi _{1}}{\pi _{2}} \label{jsol.6.sc} \end{equation}%
Equations \ref{jsol.1.sc} through \ref{jsol.6.sc} make it possible to derive the structural parameters from the reduced-form parameters. Under the indirect least squares estimation procedure, these equations are used to generate estimates of structural equations from OLS estimates of the associated reduced form equations. In this particular application, an econometrician would first estimate the parameters of the reduced-form equations \ref{j.id.rf.sc} and \ref{j.id.rf1.sc} by OLS. The parameters of the structural equations can then be estimated using the relationships: \begin{equation}
\hat{\beta}_{o}=\frac{\hat{\pi}_{o}\hat{\pi}_{2}\hat{\pi}_{5}^{2}-\hat{\pi}% _{o}\hat{\pi}_{1}\hat{\pi}_{4}\hat{\pi}_{5}-\hat{\pi}_{1}\hat{\pi}_{2}\hat{% \pi}_{3}\hat{\pi}_{5}+\hat{\pi}_{4}\hat{\pi}_{1}^{2}\hat{\pi}_{3}}{\hat{\pi}% _{2}\hat{\pi}_{5}^{2}-\hat{\pi}_{1}\hat{\pi}_{4}\hat{\pi}_{5}} \label{jsol.1.hat.sc}
\end{equation}%
\begin{equation}
\hat{\beta}_{1}=\frac{\hat{\pi}_{1}}{\hat{\pi}_{5}} \label{jsol.2.hat.sc} \end{equation}%
\begin{equation}
\hat{\beta}_{2}=\hat{\pi}_{2}-\frac{\hat{\pi}_{4}\hat{\pi}_{1}}{\hat{\pi}_{5}% } \label{jsol.3.hat.sc}
\end{equation}%
\begin{equation}
\hat{\gamma}_{o}=\frac{\hat{\pi}_{2}^{2}\hat{\pi}_{3}\hat{\pi}_{5}-\hat{\pi}% _{1}\hat{\pi}_{2}\hat{\pi}_{3}\hat{\pi}_{4}-\hat{\pi}_{o}\hat{\pi}_{2}\hat{% \pi}_{4}\hat{\pi}_{5}+\hat{\pi}_{o}\hat{\pi}_{1}\hat{\pi}_{4}^{2}}{\hat{\pi}% _{2}^{2}\hat{\pi}_{5}-\hat{\pi}_{1}\hat{\pi}_{2}\hat{\pi}_{4}} \label{jsol.4.hat.sc}
\end{equation}%
\begin{equation}
\hat{\gamma}_{1}=\frac{\hat{\pi}_{4}}{\hat{\pi}_{2}} \label{jsol.5.hat.sc} \end{equation}%
\begin{equation}
\hat{\gamma}_{2}=\hat{\pi}_{5}-\frac{\hat{\pi}_{4}\hat{\pi}_{1}}{\hat{\pi}% _{2}} \label{jsol.6.hat.sc}
\end{equation}%
The resultant estimates provide consistent estimates of all population parameters.\footnote{%
A demonstration of the consistency of indirect least-squares estimators requires mathematical tools beyond the scope of this text. A formal proof may be found in any advanced text. See, for example, the discussion in Greene (2000), pp. 682-4.} While OLS estimates of the reduced-form parameters are unbiased, the indirect least squares estimates of the structural parameters are biased. The basic cause of the bias is that the structural parameters are all nonlinear transformations of the unbiased reduced-form estimates. As noted in Chapter \ref{stat.chap}, $E\left[ g(X)% \right] $ is not, in general, equal to $g[E(X)]$. More generally, $E\left[ g(X,Y)\right] $ is not equal to $g\left( E(X),E(Y)\right) $. For example: \begin{equation*}
E\left( \frac{X}{Y}\right) \neq \frac{E(X)}{E(Y)} \end{equation*}%
and
\begin{equation*}
E(XY)\neq E(X)\cdot E(Y)
\end{equation*}
In principle, this indirect least squares procedure may be used to estimate the parameters of any set of just identified equations.\footnote{% This assumed that the \textquotedblleft rank condition\textquotedblright\ of identification is also satisfied.} In complex systems of equations, the task of computing the transformations between the reduced-form and structural parameters can be very time consuming and tedious. The 2SLS estimation procedure, however, can be accomplished by most econometrics software packages in a matter of seconds. For this reason (as well as others discussed below), practicing econometricians will virtually always use the 2SLS procedure instead of the indirect least squares procedure.
\subsubsection{Case II: Underidentified equations} Consider the following two-equation structural model: \begin{equation}
Y_{1i}=\beta _o+\beta _1Y_{2i}+\beta _2X_i+u_i \label{under.sc} \end{equation}
\begin{equation}
Y_{2i}=\alpha _o+\alpha _1Y_{1i}+\alpha _2X_i+v_i \label{under.a.sc} \end{equation}
In this simultaneous equation system, neither of these equations is identified. Solving for the equilibrium levels of $Y_{1i}$ and $Y_{2i}$ in terms of the exogenous variable $X_i$ results in: \begin{equation}
Y_{1i}=\frac{\beta _o+\beta _1\alpha _o}{1-\beta _1\alpha _1}+\frac{\beta _1\alpha _2+\beta _2}{1-\beta _1\alpha _1}X_i+\frac{\beta _1v_i+u_i}{1-\beta _1\alpha _1} \label{under.1.sc}
\end{equation}
\begin{equation}
Y_{2i}=\frac{\alpha _o+\alpha _1\beta _o}{1-\alpha _1\beta _1}+\frac{\alpha _1\beta _2+\alpha _2}{1-\alpha _1\beta _1}X_i+\frac{\alpha _1u_i+v_i}{% 1-\alpha _1\beta _1} \label{under.2.sc} \end{equation}
Defining:
\begin{equation}
\pi _o=\frac{\beta _o+\beta _1\alpha _o}{1-\beta _1\alpha _1} \label{under.eq.sc}
\end{equation}
\begin{equation}
\pi _1=\frac{\beta _1\alpha _2+\beta _2}{1-\beta _1\alpha _1} \label{under.eq.a.sc}
\end{equation}
\begin{equation}
\pi _2=\frac{\alpha _o+\alpha _1\beta _o}{1-\alpha _1\beta _1} \label{under.eq.b.sc}
\end{equation}
\begin{equation}
\pi _3=\frac{\alpha _1\beta _2+\alpha _2}{1-\alpha _1\beta _1} \label{under.eq.c.sc}
\end{equation}
\begin{equation*}
\epsilon _{1i}=\frac{\beta _1v_i+u_i}{1-\beta _1\alpha _1} \end{equation*}
\begin{equation*}
\epsilon _{2i}=\frac{\alpha _1u_i+v_i}{1-\alpha _1\beta _1} \end{equation*}
the reduced form version of equations \ref{under.sc} and \ref{under.a.sc} can be stated as:
\begin{equation}
Y_{1i}=\pi _o+\pi _1X_i+\epsilon _{1i} \label{under.est.sc} \end{equation}
\begin{equation}
Y_{2i}=\pi _2+\pi _3X_i+\epsilon _{2i} \label{under.est.a.sc} \end{equation}
The parameters of equations \ref{under.est.sc} and \ref{under.est.a.sc} may be estimated using an OLS estimation procedure. This procedure provides estimates of the parameters $\pi $$_o,\pi _1,\pi _2,$ and $\pi $$_3$.
Suppose that these estimated coefficients are used in place of the actual values in equations \ref{under.eq.sc} through \ref{under.eq.c.sc}. These equations provide 4 equations in the 6 unknown variables $\beta _o$, $\beta _1,\beta _2,\alpha _o,\alpha _1$, and $\alpha _2$. In this case, it is not possible to solve the equations for the original structural parameters of either equation. Thus, it is not possible to generate estimates of the parameters of an underidentified equation.
\subsubsection{Case III: Overidentified equations} As noted above, an overidentified equation contains fewer right-hand side variables than the number of excluded predetermined variables. To see the effect of overidentification, let’s reconsider the simple macroeconomic model presented above:
\begin{equation}
C_{t}=\beta _{o}+\beta _{1}YD_{t}+u_{t} \label{keynes.2.lc} \end{equation}%
\begin{equation}
Y_{t}=C_{t}+I_{t}+G_{t}+NX_{t} \label{keynes.2.lca} \end{equation}%
\begin{equation}
YD_{t}=Y_{t}-Tn_{t} \label{keynes.2.lcb} \end{equation}
As demonstrated above, the reduced-form version of the consumption function given by:
\begin{equation}
C_{t}=\pi _{5}+\pi _{6}I_{t}+\pi _{7}G_{t}+\pi _{8}Tn_{t}+\pi _{9}NX_{t}+\epsilon _{t} \label{rf.c.2.sc} \end{equation}%
where:
\begin{equation}
\pi _{5}=\frac{\beta _{o}}{1-\beta _{1}} \label{c.rf.1.sc} \end{equation}%
\begin{equation}
\pi _{6}=\frac{\beta _{1}}{1-\beta _{1}} \label{c.rf.2.sc} \end{equation}%
\begin{equation}
\pi _{7}=\frac{\beta _{1}}{1-\beta _{1}} \label{c.rf.3.sc} \end{equation}%
\begin{equation}
\pi _{8}=\frac{-\beta _{1}}{1-\beta _{1}} \label{c.rf.4.sc} \end{equation}%
\begin{equation}
\pi _{9}=\frac{\beta _{1}}{1-\beta _{1}} \label{c.rf.5.sc} \end{equation}%
\begin{equation}
\epsilon _{t}=\frac{2u_{t}-\beta _{1}u_{t}}{1-\beta _{1}} \label{c.rf.6.sc} \end{equation}%
(While it is possible to estimate the parameters of the national income and disposable income reduced-form equations as well, these equations are not necessary to illustrate the problem associated with overidentification.) In this model, there are only two structural parameters, $\beta _o$ and $% \beta _1$. Equations \ref{c.rf.1.sc} through \ref{c.rf.6.sc} provide six equations that may be used to determine these parameters from the reduced-form estimates (after replacing the actual values with their estimated values). In this case, however, multiple estimates are available.
For example, it is possible to estimate $\beta _1$ as: \begin{equation*}
\hat \beta _1=\frac{\pi _6}{1-\pi _6} \end{equation*}
or:
\begin{equation*}
\hat \beta _1=\frac{\pi _7}{1-\pi _7} \end{equation*}
or:
\begin{equation*}
\hat \beta _1=\frac{-\pi _8}{1-\pi _8} \end{equation*}
or:
\begin{equation*}
\hat \beta _1=\frac{\pi _9}{1-\pi _9} \end{equation*}
A similar problem occurs with the estimation of $\hat \beta _o$.
In general, when an equation is overidentified, there will always be more than one possible way to obtain estimates of the model parameters. While each of the estimates will be consistent, they will, in general, differ. In this case, which estimate should be selected? The presence of multiple estimates of parameters in overidentified systems provides another strong reason for the adoption of a 2SLS estimator (since the 2SLS\ estimator provides a unique estimate of each model parameter).