Chapter 4: Bivariate Regression Model
\section{Introduction}
In this chapter, you will examine the bivariate regression model. In this model, it is assumed that there is a linear relationship between a dependent variable and a single independent variable. By restricting our analysis to the case of a single independent variable we will be able to derive a number of properties using relatively simple algebraic techniques. Beginning in Chapter \ref{mult.chap}, we will extend this discussion to allow for the presence of several independent variables.
In this chapter (and the following three chapters), we will consider the case of the classical regression model, an ideal case that requires a number of relatively strong assumptions. In subsequent chapters, we will relax each of these assumptions.
\section{Population Regression Function}
Initially, we will consider a case in which there is a linear relationship between a dependent variable ($Y$) and a single independent variable ($X)$.
An exact linear relationship between $Y_{i}$ and $X_{i}$ could be expressed as:
\begin{equation}
Y_{i}=\beta _{o}+\beta _{1}X_{i} \label{simple.form.bc}
\end{equation}%
In this equation, the intercept term, $\beta _{o}$, is the value of $Y_{i}$ that will occur when $X_{i}$ equals zero. The slope parameter, $\beta _{1}$, is a measure of the change in the dependent variable that occurs when there is a one-unit change in the level of the independent variable ($X_{i}$). In mathematical terms:
\begin{equation*}
\beta _{1}=\frac{\Delta Y_{i}}{\Delta X_{i}}
\end{equation*}%
In practice, however, equation \ref{simple.form.bc} will not hold exactly for each observed combination of $X_{i}$ and $Y_{i}$. Instead, we assume that a linear relationship describes the general pattern of the relationship between $Y_{i}$ and $X_{i}$. A useful way of dealing with this issue is to assume that the average value of $Y_{i}$ (in the population) is a linear function of $X_{i}$. More precisely, we will assume that the conditional expectation of $Y$ given $X$ is a linear function of $X$. Using mathematical notation, this relationship can be stated as:
\begin{equation}
E(Y|X_{i})=\beta _{o}+\beta _{1}X_{i} \label{pop.reg.func.bc} \end{equation}%
Figure~\ref{prf_graph} contains a graph illustrating this relationship.
\begin{center}
\FRAME{ftbpFU}{4.8248in}{2.898in}{0pt}{\Qcb{Population regression function}}{% \Qlb{prf_graph}}{fig4-1.gif}{\special{language “Scientific Word”;type “GRAPHIC”;maintain-aspect-ratio TRUE;display “USEDEF”;valid_file “F”;width 4.8248in;height 2.898in;depth 0pt;original-width 4.7712in;original-height 2.8539in;cropleft “0”;croptop “1”;cropright “1”;cropbottom “0”;filename ‘graphs/Fig4-1.gif’;file-properties “XNPEU”;}}
\end{center}
The relationship described by equation \ref{pop.reg.func.bc} (and depicted in Figure~\ref{prf_graph}) is called a \textbf{population regression function% }. This equation describes the relationship that exists between an independent variable ($X$) and the average value of $Y$ (at each level of $X$% ) in the entire population. The intercept term, $\beta _{o}$, is the intercept of this function on the vertical axis. This intercept term provides a measure of the value of the dependent variable that occurs when $% X_{i}$ equals zero. The slope term, $\beta _{1}$, provide a measure of the change in the expected value of $Y_{i}$ that occurs when $X_{i}$ changes by one unit.
As noted above, however, economic relationships are characterized by some degree of randomness. Because of this randomness, the observed value of $% Y_{i}$ will generally differ from the value predicted by the population regression function. In particular, the observed $Y_{i}$ can be expressed as:
\begin{equation}
Y_{i}=E(Y|X_{i})+u_{i} \label{condexp.u.bc}
\end{equation}
where $u_{i}$ is a random error term. Substituting the relationship in equation \ref{pop.reg.func.bc} into equation \ref{condexp.u.bc} results in: \begin{equation}
Y_{i}=\underset{\text{component}}{\underset{\text{deterministic}}{% \underbrace{\beta _{o}+\beta _{1}X_{i}}}}+\underset{\text{component}}{% \underset{\text{random}}{\underbrace{u_{i}}}} \label{condexp.u1.bc} \end{equation}
Equation \ref{condexp.u1.bc} indicates that the observed value of $Y_{i}$ can be expressed as the sum of two components:
\begin{enumerate}
\item a deterministic component ($E(Y|X_i)=\beta _o+\beta _1X_i$)$,$ and \item a random component ($u_i$).
\end{enumerate}
The deterministic component of equation \ref{condexp.u.bc} is that portion of $Y_i$ that can be explained by the population regression function.
Roughly speaking, this deterministic component represents that part of $Y_i$ that can be explained by knowing the level of $X_i$. The random error term, $% u_i$, represents that portion of $Y_i$ that cannot be explained by the population regression function.
An inspection of equation \ref{condexp.u.bc} indicates that the random error term represents the difference between the observed value of the dependent variable ($Y_{i}$) and the conditional expectation of $Y$ given $X_{i}$. In mathematical terms:
\begin{equation*}
u_{i}=Y_{i}-E(Y|X=X_{i})
\end{equation*}%
This relationship is illustrated in Figure~\ref{prf_sample_graph}. As this diagram illustrates, the value of the random error term, $u_{i}$, is equal to the vertical distance between the observed value of $Y$ and the population regression line at a particular value of $X$.
\begin{center}
\FRAME{ftbpFU}{5.4535in}{2.9706in}{0pt}{\Qcb{Population regression function and sample observations}}{\Qlb{prf_sample_graph}}{fig4-2.gif}{\special% {language “Scientific Word”;type “GRAPHIC”;maintain-aspect-ratio TRUE;display “USEDEF”;valid_file “F”;width 5.4535in;height 2.9706in;depth 0pt;original-width 5.3956in;original-height 2.9274in;cropleft “0”;croptop “1”;cropright “1”;cropbottom “0”;filename
‘graphs/Fig4-2.gif’;file-properties “XNPEU”;}}
\end{center}
\section{Sample Regression Function\label{SRF}}
If the population regression function were known, it could provide a useful tool for predicting the value of $Y$ that will occur for any value of $X_{i}$% . In practice, however, the parameters $\beta _{o}$ and $\beta _{i}$ are unknown and must be estimated from sample data. This estimated relationship may be expressed as:\footnote{%
As noted in Chapter 1, a “\symbol{94}” is used to indicate an estimated value for a variable or parameter. For example, $\hat{\beta}_{1}$ denotes an estimated value of the slope parameter $\beta _{1}$. In a similar manner, $% \hat{Y}_{i}$ and $\hat{u}_{i}$ denote the fitted values of the dependent variable and error term, respectively.}
\begin{equation*}
\hat{Y}_{i}=\hat{\beta}_{o}+\hat{\beta}_{1}X_{i}
\end{equation*}
where $\hat{\beta}_{o}$ and $\hat{\beta}_{1}$ are the estimated values of the intercept and slope parameters respectively. $\hat{Y}_{i}$ is the estimated value of $Y$ at a particular value of $X$. This estimated relationship is called a \textbf{sample regression function}.% \exbox{Okun’s law}{In a classic econometric study, Arthur Okun (1962) attempted to measure the relationship that exists between economic growth and changes in the unemployment rate. Okun argued that a constant expansion in real GNP was needed to maintain a constant unemployment rate. If real GNP grows by a larger amount, the unemployment rate declines; a smaller expansion in real GNP will result in a decline in the unemployment rate.
To estimate the relationship between changes in the
unemployment rate and changes in the level of real GDP, Okun used quarterly U.S.
data (2nd quarter, 1947 to 4th quarter, 1960) to estimate the sample regression equation: $$
\hat{Y}_t=0.30-.30X_t
$$
where $Y_t$ is defined as the quarterly change in the unemployment rate (expressed in percentage points)
and $X_t$ is the quarterly percentage change in real GNP.
This result suggests that a 1\% increase in the growth rate of real GNP will cause the unemployment rate to decline by 0.3\%. Since this result was widely accepted by economists during the 1960s and early 1970s, it became known as {\bf Okun’s law.}}
Of course, the actual value of $Y_i$ will differ from the estimated value ($% \hat{Y}_i$). The difference between the estimated and actual values of the dependent variable is equal to the sample error term $\hat{u}_i$. In mathematical terms, the relationship between the realized value of the dependent variable ($Y_i$) and the independent variable ($X_i$) may be expressed as:
\begin{equation}
Y_i=\hat{\beta}_o+\hat{\beta}_1X_i+\hat{u}_i \label{kjla.bc} \end{equation}
Once the values of the estimated parameters $\beta _o$ and $\beta _1$ are determined, the sample error term can be computed as:
\begin{equation}
\hat{u}_i=Y_i-\hat{Y}_i \label{sampleerror}
\end{equation}
Equation \ref{sampleerror} indicates that the sample error term equals the difference between the observed value of $Y_i$ and the value predicted by the regression equation ($\hat{Y}_i$). By rearranging the terms in equation % \ref{kjla.bc}, the sample error term may also be expressed as: \begin{equation}
\hat{u}_i=Y_i-\hat{\beta}_o-\hat{\beta}_1X_i \label{sampleerror.alt} \end{equation}
The estimated values of the parameters $\beta _{o}$ and $\beta _{1}$ will generally differ from the true population values since the estimates are constructed using a sample that is a subset of the entire population. The relationship between the sample and population regression functions is depicted in Figure~\ref{prf_srf}. As this diagram illustrates, the sample error term $\hat{u}_{i}$ differs from the population error term $u_{i}$ since the estimated regression equation (the sample regression function) is not equal to the true relationship (the population regression function). If you were to estimate this relationship in several different samples, the estimated slope and intercept terms will not be exactly the same in each case.
\begin{center}
\FRAME{ftbpFU}{5.4319in}{2.898in}{0pt}{\Qcb{Population and sample regression functions}}{\Qlb{prf_srf}}{fig4-3.gif}{\special{language “Scientific Word”;type “GRAPHIC”;maintain-aspect-ratio TRUE;display “USEDEF”;valid_file “F”;width 5.4319in;height 2.898in;depth 0pt;original-width 5.3748in;original-height 2.8539in;cropleft “0”;croptop “1”;cropright “1”;cropbottom “0”;filename ‘graphs/Fig4-3.gif’;file-properties “XNPEU”;}} \end{center}
While the sample estimates of the parameters are not equal to the actual parameter values, under certain conditions the estimates of $\beta _{o}$ and $\beta _{1}$ will be unbiased, consistent, and will attain the minimum variance possible for linear unbiased estimators. As we discussed in the preceding chapter, these are quite desirable properties. Let’s discuss a set of ideal conditions that will guarantee that these properties are satisfied.
These ideal conditions constitute a model known as the \textbf{classical regression model}.
\section{Assumptions\label{crmo.ass.bc}}
There are a number of basic assumptions that characterize the classical regression model:
\begin{assumption}
A linear relationship exists between $Y_{i}$ and $X_{i}$. The form of this relationship is given by:
\begin{equation}
Y_{i}=\beta _{o}+\beta _{1}X_{i}+u_{i} \label{lin_param.bc} \end{equation}
\end{assumption}
This condition states that the actual relationship that exists between the dependent and independent variables is linear with an intercept equal to $% \beta _{o}$ and a slope equal to $\beta _{1}$. While this linearity assumption appears somewhat restrictive, it is more flexible than it may appear. It is often possible, for example, to convert a nonlinear relationship into a linear form by suitably redefining the variables in the equation.
To see how such a transformation may be used, consider the relationship given by:\footnote{%
As will be discussed in Chapter \ref{linear.chap}, a reciprocal relationship of this sort has often been used in modelling the Phillips curve (an empirical relationship between inflation and unemployment rates).} \begin{equation*}
Y_{i}=\beta _{o}+\beta _{1}\left( \frac{1}{X_{i}}\right) +u_{i} \end{equation*}
This relationship can be converted into a linear form by defining a new variable, $Z_{i}$ as:
\begin{equation*}
Z_{i}=\frac{1}{X_{i}}
\end{equation*}
Using this definition for $Z_{i}$, the original equation can be restated in a linear form as:
\begin{equation*}
Y_{i}=\beta _{o}+\beta _{1}Z_{i}+u_{i}
\end{equation*}
While the relationship between $Y_{i}$ and $X_{i}$ is nonlinear, the relationship between $Y$ and the transformed variable $Z_{i}$ is linear. The linearity assumption and the use of transformations of this sort are discussed in detail in Chapter \ref{linear.chap}.
Initially, we will assume that $X_{i}$ is the only observable variable that affects $Y_{i}$. Beginning in Chapter \ref{mult.chap}, however, we will expand the model to allow for more than one independent variable on the right-hand side of the regression equation.
\begin{assumption}
The mean value of the error term, $u_i$, is zero (i.e., $E(u_i)=0$).\label% {E(u).0.c3}
\end{assumption}
The estimation procedure that we will be using is based on the assumption that the mean value of the error term is zero. If the mean value of the error term is not zero, our estimates will either overstate or understate the intercept term $\beta _{o}$. In the case illustrated in Figure~\ref% {eu_ne_0}, the mean value of the error term is positive. Therefore, the error terms tend to be concentrated above the population regression line. An estimation procedure that relies on the assumption that the mean value of the error term is zero will tend to overstate the intercept term when the mean value of the error terms is positive. As this diagram illustrates, the estimated regression equation will lie above the population regression line in this case. If all of the other conditions of the classical regression model are satisfied, however, the slope coefficient will not be biased.\label% {eu.biased.4}
\begin{center}
\FRAME{ftbpFU}{5.0237in}{3.013in}{0pt}{\Qcb{Regression analysis when $% E(u_{i})\neq 0$}}{\Qlb{eu_ne_0}}{fig4-4.gif}{\special{language “Scientific Word”;type “GRAPHIC”;maintain-aspect-ratio TRUE;display “USEDEF”;valid_file “F”;width 5.0237in;height 3.013in;depth 0pt;original-width 4.9684in;original-height 2.9689in;cropleft “0”;croptop “1”;cropright “1”;cropbottom “0”;filename ‘graphs/Fig4-4.gif’;file-properties “XNPEU”;}} \end{center}
This problem can be eliminated by redefining the regression model. If the expected value of the error term equals a constant, $k$, a transformed model can be estimated of the form:
\begin{equation*}
Y_i=\beta _o^{\prime }+\beta _1X_i+v_i
\end{equation*}
where $\beta _o^{\prime }=\beta _o+k$. In this transformed model, the expected value of the new error term, $v_i$, is zero. Of course, when we estimate the value of $\beta _o^{\prime }$ we can estimate $\beta _o$ only if we know the value of $k$.
\begin{assumption}
The error terms are identically distributed with a constant variance equal to $\sigma ^2$ for all values of $X_i$.
\end{assumption}
If the error terms have a constant variance at all levels of $X_{i}$, the error process is said to be \textbf{homoskedastic} (also spelled homoscedastic). Thus, this condition requires that the error terms be homoskedastic. If this assumption is violated, the variance of the error terms is greater at some levels of $X_{i}$ than at other levels. When the variance of the error terms varies with $X_{i}$, the error process is said to be \textbf{heteroskedastic}. Figure~\ref{homo_hetero} illustrates the difference between homoskedastic and heteroskedastic error processes. In the diagram on the right-hand side of this figure, the variance of the error term increases as $X_{i}$ increases.
%TCIMACRO{%
%\TeXButton{Boxed text}{\exbox{Heteroskedasticity}{While econometricians have generally agreed on what %is meant by the concept of “heteroskedasticity,” there has much less agreement on %the spelling of this word. Economists have used both “heteroskedasticity” and %“heteroscedasticity” to denote the same concept. McCulloch (1985) %has provided what is hoped to be the definitive argument on this issue. He notes that %the term “heteroskedasticity” is derived from the Greek roots “hetero-”
%($\grave{\varepsilon}\tau \varepsilon \rho o$-)
%and “skedannumi”
%($\sigma \kappa \varepsilon \delta \acute{\alpha}\nu \nu \upsilon \mu \iota $). The spelling issue ultimately depends whether %the Greek letter $“\kappa$” should be translated as a “k” or a “c.” McCulloch argues %that $“\kappa$” is translated as a “k” in all English words that have been directly derived %from Greek roots in modern times.
%
%Throughout the rest of this text, the “heteroskedasticity” spelling is adopted. If you %skim through economic journals, however, you will find that there are still a few diehard %supporters of the “heteroscedasticity” spelling.}}}%
%BeginExpansion
\exbox{Heteroskedasticity}{While econometricians have generally agreed on what is meant by the concept of “heteroskedasticity,” there has much less agreement on the spelling of this word. Economists have used both “heteroskedasticity” and “heteroscedasticity” to denote the same concept. McCulloch (1985) has provided what is hoped to be the definitive argument on this issue. He notes that the term “heteroskedasticity” is derived from the Greek roots “hetero-”
($\grave{\varepsilon}\tau \varepsilon \rho o$-)
and “skedannumi”
($\sigma \kappa \varepsilon \delta \acute{\alpha}\nu \nu \upsilon \mu \iota $). The spelling issue ultimately depends whether the Greek letter $“\kappa$” should be translated as a “k” or a “c.” McCulloch argues that $“\kappa$” is translated as a “k” in all English words that have been directly derived from Greek roots in modern times.
Throughout the rest of this text, the “heteroskedasticity” spelling is adopted. If you skim through economic journals, however, you will find that there are still a few diehard supporters of the “heteroscedasticity” spelling.}%
%EndExpansion
\begin{center}
\FRAME{ftbpFU}{5.047in}{2.8522in}{0pt}{\Qcb{Comparison of homoskedastic and heteroskedastic error processes}}{\Qlb{homo_hetero}}{fig4-5.gif}{\special% {language “Scientific Word”;type “GRAPHIC”;maintain-aspect-ratio TRUE;display “USEDEF”;valid_file “F”;width 5.047in;height 2.8522in;depth 0pt;original-width 5.6455in;original-height 3.1773in;cropleft “0”;croptop “1”;cropright “1”;cropbottom “0”;filename
‘graphs/Fig4-5.gif’;file-properties “XNPEU”;}}
\end{center}
Consider the relationship that exists between educational attainment ($X$) and individual income ($Y$) in a given economy. As educational attainment rises, so does the average level of income. It is likely, however, that there is a relatively low variance in earnings at low levels of educational attainment. As the level of educational attainment (and income) rises, the variance of income is also likely to rise. The variance of the income among high school dropouts is generally relatively small because there are few high-income jobs available to individuals with low levels of educational attainment. Individuals possessing more than 16 or more years of schooling, however, generally have a relatively high variance in earnings. For example, some individuals with Ph.D. degrees receive high salaries as consultants while other Ph.D. recipients receive substantially lower incomes as college professors (some individuals with Ph.D. degrees may also choose to work as short-order cooks or taxi drivers). This increase in the variance of income as educational attainment increases will result in a scatterplot that is similar to that appearing in the right-hand side diagram of Figure~\ref% {homo_hetero}.
When heteroskedasticity is present, some sample observations contain more precise information about the location of the regression equation. In the diagram on the right-hand side of Figure~\ref{homo_hetero}, more precise information about the location of the population regression line is provided by those observations in which the value of $X_{i}$ is relatively low. In a later chapter, we will examine how to estimate the parameters of a regression model when heteroskedasticity is present. Initially, however, we will restrict our discussion to the analysis of error processes that are homoskedastic.
\begin{assumption}
The error terms are independent across observations (i.e., $u_i$ is independent of $u_j$ when $i\neq j$).
\end{assumption}
This assumption indicates that there is no systematic relationship between the error terms at different observations. One implication of this assumption is that the error terms are uncorrelated across observations. A common violation of this assumption is the presence of \textbf{% autocorrelation}, a phenomenon that is often present in time-series analysis. Autocorrelation occurs when the current value of the error term ($% u_{t}$) is correlated with past error terms ($u_{t-j}$). In this case, $% E(u_{t},u_{t-j})\neq 0$ (for $j>0$). This autocorrelation may occur because many types of macroeconomic shocks generate effects over periods of time that do not correspond to the sampling interval. For example, a random shock that increases total spending during one time period will result in higher income and additional rounds of spending that are likely to continue into the next time period. A simple, and common, type of autocorrelation is first-order autocorrelation. In this case, there is a nonzero correlation between $u_{t}$ and $u_{t-1}$.\footnote{%
An autoregressive process of order $p$ is said to occur when the current error term is a function of the previous $p$ values of the error term. In mathematical terms, a $p$th order autoregressive process may be expressed as:
\begin{equation*}
u_{t}=\gamma _{1}u_{t-1}+\gamma _{2}u_{t-2}+\cdots +\gamma _{p}u_{t-p}+\varepsilon _{t}
\end{equation*}
where $u_{t}$ is the error term appearing in the regression equation in period $t$, and $\varepsilon _{t}$ is a random error term that is uncorrelated across time. Autoregressive processes such as this are discussed in Chapter \ref{auto.chap}.}
Figure~\ref{first_ar} illustrates two possible types of first-order autocorrelation. In the diagram on the left, there is a positive correlation between $u_{t}$ and $u_{t-1}$. When there is positive first-order autocorrelation, positive error terms are more likely to be followed by positive error terms while negative error terms tend to be followed by negative error terms. Positive first-order autocorrelation may occur as the result of unobservable random shocks that generate effects over more than one time period. The diagram on the right-hand side of Figure~\ref{first_ar} illustrates a possible set of outcomes when negative first-order autocorrelation is present. If the correlation between $u_{t}$ and $u_{t-1}$ is negative, positive values of $u_{t\text{ }}$tend to be followed by negative value. This type of autocorrelation might be found in annual sales equations for durable goods such as automobiles. \textit{Ceteris paribus}, if a relatively large number of cars are sold this year, there is a good chance that there will be fewer customers next year. If car sales are below average in a particular year, there may be an unusually large volume of sales in the succeeding year as customers replace their older vehicles.
\FRAME{ftbpFU}{5.047in}{2.6671in}{0pt}{\Qcb{First-order autocorrelation}}{% \Qlb{first_ar}}{fig4-6.gif}{\special{language “Scientific Word”;type “GRAPHIC”;maintain-aspect-ratio TRUE;display “USEDEF”;valid_file “F”;width 5.047in;height 2.6671in;depth 0pt;original-width 5.604in;original-height 2.9482in;cropleft “0”;croptop “1”;cropright “1”;cropbottom “0”;filename ‘graphs/Fig4-6.gif’;file-properties “XNPEU”;}}
The error terms may also be correlated across observations when pooled data or panel data is used. For example, suppose that you have data on the earnings of 200 individuals in each of five different years. Suppose that you estimate a relationship between the individuals’ earnings and their educational attainment using all 1000 observations (200 individuals $\times $ 5 observations / individual). It is likely that there are unobserved characteristics that will cause some individuals’ earnings to be consistently above (or below) average in each of the five years. Those individuals with large positive error terms in one year are likely to have large positive error terms in other years. Thus, the error terms will tend to be correlated when the observations correspond to the same individual.
In later chapters, we will examine what happens when error terms are correlated across observations. For now, however, we will assume that each error term is uncorrelated with all other error terms. This assumption, together with the preceding two assumptions, can be summarized by stating that the error terms, $u_i$ are independently and identically distributed with a mean of 0 and a variance of $\sigma ^2$.
\begin{assumption}
The $X_{i}$ are nonstochastic.\footnote{%
The term “stochastic” means “random.” Thus, a “nonstochastic” process is one that is not characterized by randomness.}\label{E(ux).0.c3} \end{assumption}
To simplify the analysis, we will initially assume that the $X_i$ are fixed (nonstochastic) in repeated samples. In other words, we assume that only the $Y$’s are random; the $X$’s are assumed to be constant. This assumption is used to simplify our initial derivations. In later chapters, we will allow the $X_i$’s to be random as well.
In an ideal situation, the level of $X_{i}$ is controlled by the econometrician when an experiment is conducted. Suppose, for example, that an econometrician wishes to estimate the parameters of a simple demand function of the form:
\begin{equation}
Q_{t}=\beta _{o}+\beta _{1}\text{Price}_{t}+u_{t} \label{lem.dem.c3} \end{equation}
\begin{equation*}
\begin{array}{ll}
\text{where:} & Q_{t}=\text{ quantity demanded at time }t \\ & \text{Price}_{t}\text{ = Price at time }t \\
& u_{t}\text{ = random error term}%
\end{array}%
\end{equation*}
The econometrician may be able to conduct an experiment in which consumers are faced with different prices on different days (as in the lemonade stand example in Chapter \ref{intro.chap}). Under this scenario, the quantities demanded at alternative prices may be observed and used to estimate the parameters of equation \ref{lem.dem.c3}.
More typically, however, econometricians are not generally able to conduct experiments of this sort. Instead, econometricians generally work with data that has been collected by government or private agencies. When working with data of this sort, both the dependent and independent variables are random variables. In this case, it is quite possible that assumption \ref% {E(ux).0.c3} will be violated.
In particular, there are two common situations that result in a violation of assumption \ref{E(ux).0.c3}:
\begin{itemize}
\item $X_i$ is measured with error, or
\item $X_i$ is endogenous.
\end{itemize}
Measurement error may occur as a result of incorrect survey responses, transcription errors, or compilation errors. The effects of (and possible remedies for) measurement error are investigated in more detail in Chapter % \ref{spec.chap}.
In many economic applications, an endogenous variable appears on the right-hand side of a regression equation. In these cases, $X_{i}$ is not only a cause of $Y_{i}$, it is also caused by $Y_{i}$. For example, in the Keynesian model, both consumption and disposable personal income are endogenous. Thus, disposable personal income is an endogenous right-hand side variable in the consumption function models discussed in Chapter \ref% {intro.chap}.\footnote{%
In Chapter \ref{simul.chap}, the consumption function is estimated using techniques that correct for the endogenous nature of disposable peronal income.} In a similar manner, both the observed price and quantity of goods are endogenous in a model of demand and supply.\footnote{% In the lemonade stand example considered earlier, the price was an exogenous variable that was controlled by the econometrician. Thus, this example does not constitute a violation of this assumption.} Fortunately, simultaneous equation methods exist that make it possible to estimate equations in which one or more endogenous variables appear on the right-hand side of the regression equation. These techniques are discussed in Chapter \ref% {simul.chap}.
One important implication of this assumption is that the covariance between the error term and the independent variable will be zero. In mathematical terms, $cov(X_i,u_j)=0$ for all $i$ and $j$.
\textbf{Proof:}
\begin{equation}
cov(X_{i},u_{j})=E\left[ \left( X_{i}-E(X_{i})\right) \left( u_{j}-E(u_{j})\right) \right] \label{cov.ux.proof}
\end{equation}
Since $X_{i}$ is assumed to be nonstochastic, equation \ref{cov.ux.proof} may be stated as:
\begin{equation*}
cov(X_{i},u_{j})=E\left( X_{i}-E(X_{i})\right) E\left( u_{j}-E(u_{j})\right) \end{equation*}
But, since $X_{i}\,$ is assumed to be nonstochastic, $E(X_{i})$ is equal to $% X_{i}$. Thus,
\begin{equation*}
cov(X_{i},u_{j})=0\times E(u_{j}-E(u_{j}))
\end{equation*}
\begin{equation*}
=0
\end{equation*}
To understand the importance of this result, consider a case in which the covariance between the error term and the independent variable is positive ($% cov(u_{i},X_{i})>0$). This indicates that the average value of the error term rises as the level of the independent variable rises. Figure~\ref% {assum_viol} illustrates this possibility. An estimated regression equation based on the observed data will overstate the slope of the regression equation in this case. In a similar manner, regression analysis will understate the slope when the covariance between the error term and the independent variable is negative. (This topic is discussed in more detail in Chapter \ref{spec.chap}.)
\begin{center}
\FRAME{ftbpFU}{4.6882in}{2.866in}{0pt}{\Qcb{Violation of assumption \protect \ref{E(ux).0.c3} ($E(u_{i}X_{i})>0$)}}{\Qlb{assum_viol}}{fig4-7.gif}{\special% {language “Scientific Word”;type “GRAPHIC”;maintain-aspect-ratio TRUE;display “USEDEF”;valid_file “F”;width 4.6882in;height 2.866in;depth 0pt;original-width 4.6354in;original-height 2.8228in;cropleft “0”;croptop “1”;cropright “1”;cropbottom “0”;filename
‘graphs/Fig4-7.gif’;file-properties “XNPEU”;}}
\end{center}
In the examples discussed in the next few chapters we will assume that assumption \ref{E(ux).0.c3} is satisfied.
\begin{assumption}
There is some variation in the independent variable ($X_i$).\footnote{% At first glance, this assumption may appear to conflict with the previous assumption that states that $X_i$ is nonstochastic. In fact, however, these two assumptions are quite compatible with each other. Roughly speaking, the assumption of a nonstochastic $X_i$ requires that the values of $X_i$ are fixed in repeated samples. Assuming that there is some variation in $X_i$ simply adds the requirement that more than one fixed value of $X_i$ is used in the analysis.}
\end{assumption}
This assumption requires that $X_{i}$ must take on more than one value. The reason for this condition should be intuitively obvious. The slope of the population regression function is a measure of how $Y_{i}$ changes when $% X_{i}$ varies. If there is no change in $X_{i}$ it is impossible to estimate the slope parameter $\beta _{1}$. This problem is illustrated in Figure~\ref% {no_var_x}. As this diagram suggests, any regression line that passes through the mean value of $Y_{i}$ fits the data just as well as any other line that passes through this point. Thus, we must require that there is some variation in $X_{i}$ if regression parameters are to be estimated.
\begin{center}
\FRAME{ftbpFU}{4.7617in}{2.9187in}{0pt}{\Qcb{Estimation of regression equation when there is no variation in $X_{i}$}}{\Qlb{no_var_x}}{fig4-8.gif}{% \special{language “Scientific Word”;type “GRAPHIC”;maintain-aspect-ratio TRUE;display “USEDEF”;valid_file “F”;width 4.7617in;height 2.9187in;depth 0pt;original-width 4.708in;original-height 2.8746in;cropleft “0”;croptop “1”;cropright “1”;cropbottom “0”;filename
‘graphs/Fig4-8.gif’;file-properties “XNPEU”;}}
\end{center}
In practice, violations of this assumption are rarely encountered in the bivariate regression model. There are a few cases, however, where a violation may occur. For example, suppose that a student wishes to analyze the effect of the exchange rate on the volume of international trade for a country possessing a fixed exchange rate. In this case, it is impossible to measure the effect of changes in the official exchange rate on trade if the official exchange rate does not vary during the period for which data is available.
\section{OLS estimation}
Let’s examine how the parameters of a regression equation may be estimated.
The objective is to find the equation of a regression line that best fits this data. Each of the sample error terms ($\widehat{u}_{i}$) is a measure of the vertical distance between the observed outcome and the regression line. Alternative estimates of the intercept and slope parameters result in different combinations of these sample error terms. It is desirable to use an estimation procedure that results in sample error terms that are relatively small (in absolute value). The most commonly used estimation technique involves selecting the values of $\hat{\beta}_{o}$ and $\hat{\beta}% _{1}$ that minimize the sum of squared error terms. In other words, the goal is to select the values of $\hat{\beta}_{o}$ and $\hat{\beta}_{1}$ that minimize:\footnote{%
Since the summation will usually be over the range of 1 through $N$, the summation operator will be assumed to cover this range unless otherwise specified. In other words, throughout the rest of this text, the symbol $% \sum $ will be used to represent $\sum_{i=1}^{N}$.}
\begin{equation}
\sum \hat{u}_{i}^{2}=\sum (Y_{i}-\hat{\beta}_{o}-\hat{\beta}_{1}X_{i})^{2} \label{min.ssq.biv}
\end{equation}
The estimator formed by minimizing the expression in equation \ref% {min.ssq.biv} is called an \textbf{ordinary least squares (or OLS) estimator}% .\footnote{%
An alternative, though less commonly used, procedure is to select the values of the intercept and slope parameters that minimize the sum of the absolute values of the sample error terms. Such an estimator is called a minimum absolute deviation (MAD) estimator.} The derivation of the OLS estimators for $\beta $$_{o}$ and $\beta $$_{1}$ requires the use of calculus. A formal derivation of these estimators appears in the mathematical appendix at the end of this chapter. The OLS estimators for $\beta $$_{o}$ and $\beta _{1}$ are:
\begin{equation}
\hat{\beta}_{o}=\overline{Y}-\hat{\beta}_{1}\overline{X} \label{beta0} \end{equation}
and
\begin{equation}
\hat{\beta}_{1}=\frac{\sum (X_{i}-\overline{X})(Y_{i}-\overline{Y})}{\sum (X_{i}-\overline{X})^{2}} \label{beta1}
\end{equation}
The estimator $\hat{\beta}_{1}$ can be expressed in more compact form if we define $x_{i}$ and $y_{i}$ as the deviations of $X_{i}$ and $Y_{i}$ from their respective sample means (\textit{i.e.}, $x_{i}=X_{i}-\overline{X}$ and $y_{i}=Y_{i}-\overline{Y}$):\footnote{%
Another equivalent expression for $\hat{\beta}_{1}$ is given by: \begin{equation*}
\hat{\beta}_{1}=\frac{\overset{N}{\underset{i=1}{\sum }}X_{i}Y_{i}-N\bar{X}% \bar{Y}}{\overset{N}{\underset{i=1}{\sum }}X_{i}^{2}-N\bar{X}^{2}} \end{equation*}%
}
\begin{equation}
\hat{\beta}_{1}=\frac{\sum x_{i}y_{i}}{\sum x_{i}^{2}} \label{beta1dev} \end{equation}
\section{Derivation of estimators}
There is another relatively simple method for constructing estimators for the parameters $\beta $$_{o}$ and $\beta _{1}$. Assumptions \ref{E(u).0.c3} and \ref{E(ux).0.c3} require that:
\begin{equation*}
E(u_{i})=0
\end{equation*}
and
\begin{equation*}
cov(X_{i},u_{i})=0
\end{equation*}
If these conditions hold in the entire population, it seems reasonable to require that they hold for our sample estimates. Estimators can be derived for which both the mean value of the sample error term and the sample covariance between $X_{i}$ and $u_{i}$ is zero.\footnote{% This is a special case of the Generalized Method of Moments (GMM) estimator.
GMM estimators are constructed by requiring that sample moments equal the corresponding population expectations. For a good (though more advanced) discussion of this topic, see Greene (2000), pp. 474-96.} In mathematical terms, this requires that:
\begin{equation}
\frac{1}{N}\sum \hat{u}_{i}=0 \label{norm1}
\end{equation}
and
\begin{equation}
\frac{1}{N-1}\sum X_{i}\hat{u}_{i}=0 \label{norm2}
\end{equation}
By substituting the value of $\hat{u}_{i}$ from \ref{sampleerror.alt} and simplifying, equations \ref{norm1} and \ref{norm2} may be restated as:\label% {normal.eq.mark.1}
\begin{equation}
\sum (Y_{i}-\hat{\beta}_{o}-\hat{\beta}_{1}X_{i})=0 \label{norm1′} \end{equation}
and,
\begin{equation}
\sum X_{i}(Y_{i}-\hat{\beta}_{o}-\hat{\beta}_{1}X_{i})=0 \label{norm2′} \end{equation}
Equations \ref{norm1′} and \ref{norm2′} are called \textbf{normal equations}% . They provide two linear equations that may be solved for the unknown parameters $\hat{\beta}_o$ and $\hat{\beta}_1$. As should be clear from the derivations above, the first normal equation (equation \ref{norm1′}) requires that the sample mean of the observed error terms equals zero; the second normal equation (equation \ref{norm2′}) requires that the sample covariance between the error terms and the independent variable equals zero.
Roughly speaking, equation \ref{norm1′} requires that the sample regression function is constructed so that it passes through the middle of the scattergram of observed data points.\footnote{%
More specifically, equation \ref{norm1′} requires that the sample regression function pass through the point ($\overline{X},\overline{Y}$). (The proof is left to the reader as an exercise.)} Satisfying equation \ref{norm1′} requires that the positive and negative deviations around the sample regression function must cancel out. The second normal equation (equation % \ref{norm2′}) eliminates the possibility of a sample regression function in which the mean value of the error term systematically increases (or decreases) as the level of the independent variable increases.
Equation \ref{norm1}\ may be restated as:
\begin{equation*}
\frac 1N\sum Y_i-\frac 1N\sum \hat{\beta}_o-\frac{\hat{\beta}_1}N\sum X_i \end{equation*}
or,
\begin{equation}
\overline{Y}-\hat{\beta}_o-\hat{\beta}_1\overline{X}_i=0 \label{beta0.abz} \end{equation}
Thus, the value of $\hat{\beta}_o$ may be computed as:
\begin{equation}
\hat{\beta}_o=\overline{Y}-\hat{\beta}_1\overline{X}_i \label{beta0′} \end{equation}
By substituting the value of $\hat{\beta}_o$ from equation \ref{beta0′} into equation \ref{norm2′}, we can solve for $\hat{\beta}_1$:
\begin{equation*}
\sum X_i(Y_i-\overline{Y}+\hat{\beta}_1\overline{X}_i-\hat{\beta}_1X_i)=0 \end{equation*}
Using Property 3 of the summation operator (derived in the appendix to Chapter \ref{stat.chap}), this can be restated as:
\begin{equation*}
\sum X_i(Y_i-\overline{Y})-\hat{\beta}_1\sum X_i(X_i-\overline{X})=0 \end{equation*}
or:
\begin{equation*}
\hat{\beta}_1\sum X_i(X_i-\overline{X})=\sum X_i(Y_i-\overline{Y}) \end{equation*}
Solving for $\hat{\beta}_i$:
\begin{equation*}
\hat{\beta}_1=\frac{\sum X_i(Y_i-\overline{Y})}{\sum X_i(X_i-\overline{X})} \end{equation*}
Using Properties 7 and 8 of the summation operator (derived in the appendix to Chapter \ref{stat.chap}), this becomes:
\begin{equation}
\hat{\beta}_1=\frac{\sum (X_i-\overline{X})(Y_i-\overline{Y})}{\sum (X_i-% \overline{X})^2} \label{beta1.sol.bc}
\end{equation}
Expressed in terms of deviations from the sample means, we can restate this as:\label{normal.eq.mark.2}
\begin{equation}
\hat{\beta}_1=\frac{\sum x_iy_i}{\sum x_i^2} \label{beta1.iv.bc} \end{equation}
You have probably noticed that these estimators for $\beta _o$ and $\beta _1$ (equations \ref{beta0′} and \ref{beta1.iv.bc}) are identical to the OLS estimators stated above. Thus, it should be clear that when OLS\ estimates of $\beta _o$ and $\beta _1$ are constructed:
\begin{enumerate}
\item The sample mean of the estimated error term, $\hat{u}_i$ will always equal zero.
\item The sample covariance between $X_{i}$ and $\hat{u}_{i}$ will be zero.% \footnote{%
It should be noted, however, that $\sum u_{i}$ and $\sum X_{i}u_{i}$ will not necessarily equal zero in any given sample (since $\hat{u}_{i}\neq u_{i}$% ).}
\end{enumerate}
\section{Expected value of the estimators\label{expect.sec.bc}} Since the OLS estimators are functions of random variables, these estimators are themselves random variables. The estimated values of the intercept and slope parameters will be different when different samples are used for estimation purposes. Since the OLS parameters are themselves random variables, it will be helpful to examine the expected value of these estimators. As noted in Chapter \ref{stat.chap}, the expected value serves as a measure of the “average” outcome for a random variable. Thus, the expected value of the intercept or slope parameters serves as a measure of the average value of these estimators.
As noted in Chapter \ref{stat.chap}, an estimator is unbiased if the expected value of the estimator equals the population parameter. It can easily be shown that the OLS estimators $\hat{\beta}_o$ and $\hat{\beta}_1$ are unbiased estimators of the population intercept and slope parameters. A proof of the unbiasedness of $\hat{\beta}_1$ follows. (The proof of the unbiasedness of $\hat{\beta}_o$ is left to the reader as an exercise.) \textbf{Proof:}
We have an unbiased estimate of the slope of the population regression function if:
\begin{equation*}
E(\hat{\beta}_1)=\beta _1
\end{equation*}
By equation \ref{beta1.sol.bc}, the estimated slope coefficient may be expressed as:
\begin{equation}
\hat{\beta}_1=\frac{\sum (X_i-\overline{X})(Y_i-\overline{Y})}{\sum (X_i-% \overline{X})^2} \label{beta1.new.sol.bc}
\end{equation}
As shown in the mathematical appendix at the end of Chapter \ref{stat.chap}, \begin{equation*}
\sum (X_i-\overline{X})(Y_i-\overline{Y})=\sum (X_i-\overline{X})Y_i \end{equation*}
Thus, equation \ref{beta1.new.sol.bc} may be restated as:
\begin{equation}
\hat{\beta}_1=\frac{\sum (X_i-\overline{X})Y_i}{\sum (X_i-\overline{X})^2} \label{new.beta1.bc}
\end{equation}
To simplify the exposition, define $w_i$ as:
\begin{equation}
w_i=\frac{X_i-\overline{X}}{\sum (X_i-\overline{X})^2} \label{weight.def.bc} \end{equation}
Using this definition, equation \ref{new.beta1.bc} can be restated as: \begin{equation}
\hat{\beta}_1=\sum w_iY_i \label{weight.b1.bc}
\end{equation}
This indicates that the estimated slope coefficient can be written as a weighted sum of the observed $Y_i$. Each observation in this summation is weighted by $w_i$. Two important properties of $w_i$ are used below: \begin{equation}
\sum w_i=0 \label{weight.prop1.bc}
\end{equation}
\begin{equation}
\sum w_iX_i=1 \label{weight.prop2.bc}
\end{equation}
Each of these properties follows directly from the properties of summation discussed in the mathematical appendix appearing at the end of Chapter \ref% {stat.chap}. (The proof of these properties is left to the reader as an exercise.)
The expected value of $\hat{\beta}_1$ can be expressed as: \begin{equation}
E(\hat{\beta}_1)=E\left( \sum w_iY_i\right) \label{beta.dev.alt} \end{equation}
Since $Y_i$ is defined as:
\begin{equation}
Y_i=\beta _o+\beta _1X_i+u_i \label{beta.dev.alt1}
\end{equation}
equation \ref{beta.dev.alt} can be restated as:
\begin{equation}
E(\hat{\beta}_1)=E\left( \sum w_i\left( \beta _o+\beta _1X_i+u_i\right) \right) \label{poiu.c3}
\end{equation}
Using the properties of summation appearing in the mathematical appendix at the end of Chapter \ref{stat.chap}, this can also be expressed as: \begin{equation*}
E(\hat{\beta}_1)=E\left( \beta _o\sum w_i+\beta _1\sum w_iX_i+\sum w_iu_i\right)
\end{equation*}
Using equations \ref{weight.prop1.bc} and \ref{weight.prop2.bc}, this simplifies to:
\begin{equation*}
E(\hat{\beta}_1)=E\left( \beta _1+\sum w_iu_i\right)
\end{equation*}
Since it is assumed that the $X_i$ are nonstochastic, each of the weights, $% w_i$ is a constant. Using the properties of expectations discussed in Chapter \ref{stat.chap}, the expected value of $\hat{\beta}_1$ simplifies to:
\begin{equation*}
E(\hat{\beta}_1)=\beta _1+\sum w_iE(u_i)
\end{equation*}
Since $E(u_i)=0$,
\begin{equation*}
E(\hat{\beta}_1)=\beta _1
\end{equation*}
Thus, the estimated slope parameter, $\hat{\beta}_1$ is unbiased. Using a similar approach, it can also be shown that the OLS\ estimator for the intercept term, $\beta _o,$ is also unbiased. The proof of this proposition is left to the reader as an exercise.
\section{Variance and covariance of the estimators}
The estimators $\hat{\beta}_o$ and $\hat{\beta}_1$ allow us to construct an estimate of the population regression function. As noted above, these intercept and slope estimators are both unbiased. This tells us that these estimators, on average, neither understate nor overstate the population parameters ($\beta _o$ and $\beta _1$). In any given experiment, however, these estimates may differ substantially from the true parameter values. To determine how “reliable” an estimator is, it is helpful to know its sample variance. Roughly speaking, a smaller variance indicates that the estimator has a higher probability of falling within a relatively small distance of the true value. As will be shown in Chapter \ref{biv.hyp.chap}, this sample variance can be used to test hypotheses concerning the value of the intercept and slope parameter.
The sample covariance is a measure of the relationship that exists between errors in estimating the intercept and slope parameters. A negative covariance between $\hat{\beta}_o$ and $\hat{\beta}_1$ indicate that an overestimate of one parameter tends to coincide with an underestimate of the other parameter.
A discussion of the sample variances and covariance appears below. The derivation of these results appears in the mathematical appendix at the end of this chapter.
\subsection{Variance of $\hat{\protect\beta}_o$}
The variance of the estimator $\hat{\beta}_o$ provides a measure of the dispersion of the estimator around its expected value (= $\beta _o$ since the OLS\ estimators are unbiased). A larger variance indicates that the estimator $\hat{\beta}_o$ tends to have a larger dispersion around its population mean ($\beta _o$). A smaller variance indicates that the estimated intercept term tends to be more tightly clustered around its mean.
The variance of $\hat{\beta}_o$ is defined as:
\begin{equation*}
var(\hat{\beta}_o)=E(\hat{\beta}_o-\beta _o)^2
\end{equation*}
As shown in the mathematical appendix at the end of this chapter, the variance of $\hat{\beta}_o$ equals:
\begin{equation}
var(\hat{\beta}_o)=\sigma ^2\left( \frac{\overline{X}^2}{\sum x_i^2}+\frac 1N\right) \label{var.beta0.biv}
\end{equation}
In practice, however, $\sigma ^2$ is not known. An estimator for this is provided by:\footnote{%
In this expression, the demoninator is $N-2$ rather than $N$ since the degrees of freedom for this estimator is $N-2$. As noted in Chapter \ref% {stat.chap}, the degrees of freedom for an estimator equals the number of observations ($N$) minus the number of parameters that must be estimated to construct the estimator. Since the construction of this estimator requires estimates of both the slope and intercept parameters, the degrees of freedom is $N-2$.}
\begin{equation}
\hat{\sigma}^2=\frac 1{N-2}\sum \hat{u}_i^2 \label{est.error.var.biv} \end{equation}
Thus the estimated variance of $\hat{\beta}_o$ is provided by: \begin{equation}
\widehat{var}(\hat{\beta}_o)=\sigma _{\hat{\beta}_o}^2=\hat{\sigma}^2\left( \frac{\overline{X}^2}{\sum x_i^2}+\frac 1N\right) \label{var.bo.c3} \end{equation}
As the size of the sample increases, $\frac 1N$ declines. You should also observe that an increase in the size of the sample will generally reduce $% \overline{X}^2/\sum x_i^2$ as well. (As the number of observations increases, the sum of the deviations from the sample mean, $\sum x_i^2$, will generally increase.) Thus, an examination of equation \ref{var.bo.c3} indicates that the variance of the estimator for $\hat{\beta}_o$ declines as the size of the sample increases. In particular, as the sample size approaches infinity, the variance of this estimator approaches zero.
Equation \ref{var.bo.c3} also indicates that the variance of this estimator is smaller when the sample variance of $X_i$ is greater (since $\sum x_i^2$ is larger in this case). Assumption 3.6 states that the variance of $X_i$ must be nonzero. As noted above, it is impossible to estimate the slope and intercept when there is no variation in $X_i$. When there is only a small amount of variation in $X_i$, estimates of regression parameters will be less reliable. Since the regression equation explains how $Y$ tends to change as $X$ changes, more reliable parameter estimates can be determined when there is more variation in $X$.
As the estimated variance of the error terms ($\hat \sigma ^2$) increases, the variance of this estimator rises. When the variance of the error terms is larger, the observed outcomes will, in general, be less tightly bunched around the regression line. This increase in the variance in the error term reduces the precision of the estimated intercept term.
The \textbf{standard error} of the estimator $\hat{\beta}_{o}$ is defined as:% \footnote{%
Notice that the term “standard error” is just a simpler name for the estimated standard deviation of the estimator.}
\begin{equation*}
s.e.(\hat{\beta}_{o})=\sqrt{\widehat{var}(\hat{\beta}_{o})} \end{equation*}
The standard error of the estimator is used extensively in hypothesis testing (as you will see in Chapter \ref{biv.hyp.chap}).
\subsection{Variance of $\hat{\protect\beta}_1$} The variance of the estimator $\hat{\beta}_1$ is given by: \begin{equation*}
var(\hat{\beta}_1)=\sigma _{\hat{\beta}_1}^2=\frac{\sigma ^2}{\sum x_i^2} \end{equation*}
As noted above, $\sigma ^2$ is generally unknown. Thus, the estimated variance of $\hat{\beta}_1$ is:
\begin{equation*}
\widehat{var}(\hat{\beta}_1)=\frac{\hat{\sigma}^2}{\sum x_i^2} \end{equation*}
where $\hat{\sigma}^2$ is defined as in equation \ref{est.error.var.biv}.
As in the case above, the variance of $\hat \beta _1$: \begin{itemize}
\item tends to decline as the size of the sample increases (since $\sum x_i^2 $ increases);
\item approaches zero as the size of the sample approaches infinity (since $% \sum x_i^2$ approaches infinity);
\item is smaller when the sample variance of $X_i$ is larger, \textit{% ceteris paribus }(since this tends to increase $\sum x_i$); and \item is larger when the estimated variance of the error terms ($\hat{\sigma}% ^2$) increases.
\end{itemize}
The \textbf{standard error} of the estimator $\hat \beta _1$ is defined as: \begin{equation*}
s.e.(\hat \beta _1)=\sqrt{\widehat{var}(\hat \beta _1)} \end{equation*}
\subsection{Covariance between $\hat{\protect\beta}_o$ and $\hat{\protect% \beta}_1$}
The covariance between $\hat{\beta}_{o}$ and $\hat{\beta}_{1}$ provides a measure of the relationship that exists between errors in estimating $\beta _{o}$ and $\beta _{1}$. Since $\beta _{o}$ and $\beta _{1}$ are unbiased estimators, the covariance between $\hat{\beta}_{o}$ and $\hat{\beta}_{1}$ can be expressed as:
\begin{equation}
cov(\hat{\beta}_{o},\hat{\beta}_{1})=E\left[ \left( \hat{\beta}_{o}-\beta _{o}\right) \left( \hat{\beta}_{1}-\beta _{1}\right) \right] \label{cov.bo.b1.biv}
\end{equation}
As shown in the mathematical appendix at the end of this chapter, this covariance equals:
\begin{equation}
cov(\hat{\beta}_{o},\hat{\beta}_{1})=\frac{-\overline{X}\sigma ^{2}}{\sum x_{i}^{2}} \label{cov.bo.b1.2.biv}
\end{equation}
An inspection of equation \ref{cov.bo.b1.2.biv} indicates that the covariance between the estimated intercept and slope parameters is negative as long as $\overline{X}$ is positive (note that $\sigma ^{2}$ and $\sum x_{i}^{2}$ are always positive). This indicates that if the estimated intercept ($\hat{\beta}_{o}$) overstates the true intercept ($\beta _{o}$), then the estimated slope parameter ($\hat{\beta}_{1}$) will tend to understate the true slope. Similarly, an underestimate of the intercept parameter will tend to be associated with an overestimate of the slope parameter.
Since $\sigma ^{2}$ is generally unknown (as discussed above), the estimated covariance is measured as:
\begin{equation*}
\widehat{cov}(\hat{\beta}_{o},\hat{\beta}_{1})=\frac{-\overline{X}\hat{\sigma% }^{2}}{\sum x_{i}^{2}}.
\end{equation*}
where $\hat{\sigma}^{2}$ (as defined in equation \ref{est.error.var.biv}) is used in place of the unknown variance.
In most practical applications, however, the estimated covariance between parameters is less important than the estimated variance of these parameters. As you will see in Chapters \ref{biv.hyp.chap} and \ref% {hyp.mult.chap}, the estimated variance and standard errors of the estimated intercept and slope parameters are used extensively for hypothesis testing purposes. The estimated covariance is less commonly used for this purpose.
\section{Properties of the OLS estimators} \subsection{Linearity}
As noted in Chapter \ref{stat.chap}, an estimator is said to be linear if it can be expressed as a linear function of an observable random variable.
Let’s examine whether the OLS\ slope estimator $\hat{\beta}_1$ is linear.
If $\hat{\beta}_1$ is a linear estimator, it can be expressed in the form: \begin{equation}
\hat{\beta}_1=\sum w_iY_i \label{lin.est.biv} \end{equation}
As shown in Section \ref{expect.sec.bc}, such a linear relationship exists when $w_i$ is defined as:
\begin{equation*}
w_i=\frac{X_i-\overline{X}}{\sum \left( X_i-\overline{X}\right) ^2} \end{equation*}
Thus, $\hat{\beta}_1$ can be expressed as a linear function of the $Y_i$. In a similar manner, it can be shown that the OLS estimator for $\beta _o$ is linear as well. The proof is left to the reader as an exercise.
\subsection{Unbiasedness}
As noted in Section \ref{expect.sec.bc}, the OLS estimators, $\hat{\beta}_o$ and $\hat{\beta}_1$ are unbiased estimators. Thus, the expected values of the estimators for the intercept and slope parameters equals the corresponding population parameters. In mathematical terms, this requires that:
\begin{equation*}
E(\hat{\beta}_o)=\beta _o
\end{equation*}
and
\begin{equation*}
E(\hat{\beta}_1)=\beta _1
\end{equation*}
\subsection{The Gauss-Markov Theorem}
One of the most important results in econometrics is the \textbf{% Gauss-Markov Theorem}. This theorem states that when the conditions of the classical regression model are satisfied, the estimated parameters $\hat{% \beta}_{o}$ and $\hat{\beta}_{1}$ are best linear unbiased estimates (or \textbf{B.L.U.E}.). This means that any other linear unbiased estimators will have a variance that is at least as large as the variance of the OLS estimators. In other words, the OLS estimators attain the lowest possible variance that may be achieved by a linear unbiased estimator. Since this proof is rather long, it is contained in the mathematical appendix at the end of this chapter.
The implications of this theorem are quite important. There are an infinite number of alternative estimators of the slope and intercept parameters in a regression model. For example, one could estimate the parameters of a regression equation by randomly selecting any two data points and fitting a regression line through these points. This procedure is equivalent to an OLS estimator in which the information from all except two observations is discarded. Thus, this estimator is linear and unbiased. Of all of the possible linear and unbiased estimators of these parameters, however, none achieve a variance that is less than that of the OLS\ estimators.
\subsection{Efficiency of the OLS\ estimators} If all of the conditions of the classical regression model are satisfied and the error terms are normally distributed, then it can be shown that the OLS estimators are efficient.\footnote{%
A proof of this result may be found in a more advanced text. See, for example, Greene (2000), pp.~246-247.} In this case, they attain the lowest variance that is possible for any unbiased estimator (linear or nonlinear).
Thus, the OLS estimators are said to be \textbf{minimum variance unbiased estimators} when the error terms are normally distributed. Roughly speaking, this result indicates that, under the conditions of the classical regression model, no other unbiased estimators can perform better than the OLS estimators. The normality assumption is discussed in more detail in Chapter % \ref{biv.hyp.chap}.
\subsection{Consistency of the OLS\ estimators} Since the OLS estimators are unbiased and the variances of both $\hat \beta _o$ and $\hat \beta _1$ tend to zero as the size of the sample increases, the OLS\ estimators are consistent. In practice, this means that the variance of these estimators tends to decline as the size of the sample increases. As the size of the sample approaches infinity, the distribution of the estimator collapses to the actual parameter value.
\section{Example: A consumption function} Let’s examine how we can estimate the parameters of a simple Keynesian consumption function. In its simplest form, the consumption function is: \begin{equation*}
\text{C}_{t}=\beta _{o}+\beta _{1}\text{YD}_{t}+u_{t} \end{equation*}
\begin{center}
$
\begin{array}{ll}
\text{where:} & \text{C}_{t}=\text{ level of consumption in year }t \\ & \text{YD}_{t}=\text{ level of disposable income in year }t \\ & u_{t}=\text{ random error term in year }t% \end{array}
$
\end{center}
{\
\begin{table}[p]
\begin{center}
\begin{minipage}{4in}
\renewcommand{\footnoterule}{}
\begin{center}
\caption{Consumption Function Data \label{t3.1}} \vspace{.1in}
\begin{tabular}{rrr}
{\bf Year} & {\bf Yd \footnote{Real disposable personal income (in billions of chained 1996 dollars). Source:Bureau of Economic Analysis, U.S. Department of Commerce, http://www.bea.doc.gov.}} & {\bf C \footnote {Real consumption expenditures (in billions of chanied 1996 dollars)}} \\ \hline 1965 & 1897.6 & 2131.0 \\
1966 & 2006.1 & 2244.6\\
1967 & 2066.2 & 2340.5\\
1968 & 2184.2 & 2448.2\\
1969 & 2264.8 & 2524.3\\
1970 & 2317.5 & 2630.0\\
1971 & 2405.2 & 2745.3\\
1972 & 2550.5 & 2874.3\\
1973 & 2675.9 & 3072.3\\
1974 & 2653.7 & 3051.9\\
1975 & 2710.9 & 3108.5\\
1976 & 2868.9 & 3243.5\\
1977 & 2992.1 & 3360.7\\
1978 & 3124.7 & 3527.5\\
1979 & 3203.2 & 3628.6\\
1980 & 3193.0 & 3658.0\\
1981 & 3236.0 & 3741.1\\
1982 & 3275.5 & 3791.7\\
1983 & 3454.3 & 3906.9\\
1984 & 3640.6 & 4207.6\\
1985 & 3820.9 & 4347.8\\
1986 & 3981.2 & 4486.6\\
1987 & 4113.4 & 4582.5\\
1988 & 4279.5 & 4784.1\\
1989 & 4393.7 & 4906.5\\
1990 & 4474.5 & 5014.2\\
1991 & 4466.6 & 5033.0\\
1992 & 4594.5 & 5189.3\\
1993 & 4748.9 & 5261.3\\
1994 & 4928.1 & 5397.2\\
1995 & 5075.6 & 5539.1\\
1996 & 5237.5 & 5677.7\\
1997 & 5423.9 & 5854.5\\
1998 & 5678.7 & 6134.1\\
1999 & 5978.8 & 6331.0\\
2000 & 6294.4 & 6510.6\\ \hline
\end{tabular}
\end{center}
\end{minipage}
\end{center}
\end{table}
}
Table \ref{t3.1} contains a listing of time-series data on real consumption expenditures and real disposable income in the U.S. for the years 1965 through 2000.\footnote{%
This data is a subset of that contained in the file “cons1.dat” on the data disk that accompanies this text.} To estimate the regression parameters, we must compute a number of intermediate results such as the sample means and sum of squared deviations for these variables. Table \ref% {t3.2} illustrates this process. Using the results from this table, we can compute:
\begin{table}[tbp]
\begin{center}
\begin{minipage}{4in}
\renewcommand{\footnoterule}{}
\begin{center}
\caption{Consumption Function Data \label{t3.2}} \vspace{.1in}
\small
\begin{tabular}{|crrrrrrr|} \hline
\bf Obs. & \boldmath $Y_t (=C_t) $ & \boldmath $y_t$\footnote{$y_t$ is defined as $y_t=Y_t- \overline{Y}$} & \boldmath $y_t^2 $ & \boldmath $X_t (=YD_t)$ & \boldmath $x_t$\footnote{$x_t$ is defined as $x_t=X_t-\overline{X} $} & \boldmath $x_t^2 $ & \boldmath $x_ty_t $\\ \hline 1 & 1897.6 & -1774.9 & 3150378.5 & 2131.0 & -1960.3 & 3842689.0 & 3479356.9 \\ 2 & 2006.1 & -1666.4 & 2776990.8 & 2244.6 & -1846.7 & 3410218.8 & 3077360.3\\ 3 & 2066.2 & -1606.3 & 2580297.9 & 2340.5 & -1750.8 & 3065222.8 & 2812327.8\\ 4 & 2184.2 & -1488.3 & 2215127.8 & 2448.2 & -1643.1 & 2699704.6 & 2445442.9\\ 5 & 2264.8 & -1407.7 & 1981705.3 & 2524.3 & -1567.0 & 2455419.4 & 2205882.5\\ 6 & 2317.5 & -1355.0 & 1836107.8 & 2630.0 & -1461.3 & 2135332.7 & 1980076.0\\ 7 & 2405.2 & -1267.3 & 1606126.7 & 2745.3 & -1346.0 & 1811656.2 & 1705798.8\\ 8 & 2550.5 & -1122.0 & 1258952.6 & 2874.3 & -1217.0 & 1481034.9 & 1365486.3\\ 9 & 2675.9 & -996.6 & 993272.5 & 3072.3 & -1019.0 & 1038315.7 & 1015544.4\\ 10 & 2653.7 & -1018.8 & 1038015.7 & 3051.9 & -1039.4 & 1080306.2 & 1058949.8\\ 11 & 2710.9 & -961.6 & 924733.3 & 3108.5 & -982.8 & 965852.2 & 945069.1\\ 12 & 2868.9 & -803.6 & 645822.1 & 3243.5 & -847.8 & 718727.2 & 681300.1\\ 13 & 2992.1 & -680.4 & 462985.7 & 3360.7 & -730.6 & 533743.9 & 497107.4\\ 14 & 3124.7 & -547.8 & 300118.3 & 3527.5 & -563.8 & 317845.4 & 308854.7\\ 15 & 3203.2 & -469.3 & 220271.2 & 3628.6 & -462.7 & 214070.7 & 217148.8\\ 16 & 3193.0 & -479.5 & 229949.6 & 3658.0 & -433.3 & 187729.6 & 207769.9\\ 17 & 3236.0 & -436.5 & 190558.9 & 3741.1 & -350.2 & 122624.5 & 152863.3\\ 18 & 3275.5 & -397.0 & 157633.3 & 3791.7 & -299.6 & 89746.8 & 118941.5\\ 19 & 3454.3 & -218.2 & 47624.6 & 3906.9 & -184.4 & 33995.2 & 40236.9\\ 20 & 3640.6 & -31.9 & 1019.6 & 4207.6 & 116.3 & 13530.9 & -3714.2\\ 21 & 3820.9 & 148.4 & 22013.5 & 4347.8 & 256.5 & 65803.7 & 38060.1\\ 22 & 3981.2 & 308.7 & 95276.8 & 4486.6 & 395.3 & 156279.7 & 122023.9\\ 23 & 4113.4 & 440.9 & 194365.9 & 4582.5 & 491.2 & 241299.3 & 216564.9\\ 24 & 4279.5 & 607.0 & 368411.9 & 4784.1 & 692.8 & 480002.6 & 420521.9\\ 25 & 4393.7 & 721.2 & 520085.4 & 4906.5 & 815.2 & 664587.3 & 587913.4\\ 26 & 4474.5 & 802.0 & 643155.0 & 5014.2 & 922.9 & 851785.4 & 740155.4\\ 27 & 4466.6 & 794.1 & 630546.3 & 5033.0 & 941.7 & 886840.7 & 747792.8\\ 28 & 4594.5 & 922.0 & 850027.7 & 5189.3 & 1098.0 & 1205652.8 & 1012342.9\\ 29 & 4748.9 & 1076.4 & 1158571.2 & 5261.3 & 1170.0 & 1368952.0 & 1259376.2\\ 30 & 4928.1 & 1255.6 & 1576454.6 & 5397.2 & 1305.9 & 1705432.9 & 1639676.0\\ 31 & 5075.6 & 1403.1 & 1968603.9 & 5539.1 & 1447.8 & 2096189.2 & 2031395.1\\ 32 & 5237.5 & 1565.0 & 2449129.4 & 5677.7 & 1586.4 & 2516735.5 & 2482702.3\\ 33 & 5423.9 & 1751.4 & 3067294.9 & 5854.5 & 1763.2 & 3108952.6 & 3088053.5\\ 34 & 5678.7 & 2006.2 & 4024715.8 & 6134.1 & 2042.8 & 4173122.6 & 4098247.5\\ 35 & 5978.8 & 2306.3 & 5318878.8 & 6331.0 & 2239.7 & 5016355.6 & 5165402.9\\ 36 & 6294.4 & 2621.9 & 6874199.4 & 6510.6 & 2419.3 & 5853120.0 & 6343147.0\\ \hline Sums: &132211.1 & 0.0 & 52379422.4 & 147286.0 & 0.0 & 56608878.4 & 54305179.2\\ &$\bar{Y}= 3672.5 $ & & & $\bar{X}= 4091.3 $ & & & \\ \hline \end{tabular}
\normalsize
\caption{Estimation of consumption function parameters \label{t3.3}} \end{center}
\end{minipage}
\end{center}
\end{table}
\begin{equation*}
\hat{\beta}_{1}=\frac{\sum x_{i}y_{i}}{\sum x_{i}^{2}} \end{equation*}
\begin{equation*}
=\frac{54305179.2}{56608878.4}
\end{equation*}
\begin{equation*}
=0.9593
\end{equation*}
and
\begin{equation*}
\hat{\beta}_{o}=\overline{Y}-\hat{\beta}_{1}\overline{X} \end{equation*}
\begin{equation*}
=3672.5-0.9593(4091.3)
\end{equation*}
\begin{equation*}
=-252.3
\end{equation*}
Thus, the estimated consumption function for this time period is given by: \begin{equation}
\text{\^{C}}_{t}=-252.3+0.9593\text{YD}_{t} \label{est.cons.func.bc} \end{equation}
A graph of this estimated consumption function appears in Figure~\ref% {est_cons_graph}. The estimated intercept term, $\hat{\beta}_{o}$, negative.
As illustrated in Figure~\ref{est_cons_graph}, $\hat{\beta}_{o}$ is the estimated intercept on the vertical axis. According to this equation, the level of consumption would be -252.3 billion dollars when disposable income is zero. Does this mean that there will be a negative level of consumption spending if disposable income were to fall to zero? Not necessarily. The fitted regression line is an approximation of the true relationship between consumption expenditures and disposable income. This line provides the best linear representation of the observed relationship between these variables.
All of these observations, however, occur at levels of disposable income that are substantially greater than zero. While the relationship may be approximately linear in the observed range of outcomes, it may be nonlinear at levels of disposable income that have not been observed.
\FRAME{ftbpFU}{5.047in}{3.1548in}{0pt}{\Qcb{Estimated consumption function}}{% \Qlb{est_cons_graph}}{fig4-9.gif}{\special{language “Scientific Word”;type “GRAPHIC”;maintain-aspect-ratio TRUE;display “USEDEF”;valid_file “F”;width 5.047in;height 3.1548in;depth 0pt;original-width 5.6662in;original-height 3.531in;cropleft “0”;croptop “1”;cropright “1”;cropbottom “0”;filename ‘graphs/Fig4-9.gif’;file-properties “XNPEU”;}} It should be noted that the magnitude of the intercept term is only rarely of theoretical interest to economists. This does not, however, suggest that the intercept term is unimportant, or should be excluded from regression equations! The inclusion of an intercept term in a regression equation allows the fitted equation to more precisely fit the observed outcomes.
While most econometric software packages provide an option for estimating regression equations without an intercept term, this option should be used only in exceptional cases in which there is a compelling theoretical reason to omit the intercept (and thereby constrain the regression equation to pass through the origin).\footnote{%
Even in these cases, many econometricians would argue that it is better to include the intercept in the equation and perform statistical tests to determine whether the assumption of an intercept equal to zero is consistent with observed outcomes. Tests for this purpose are discussed in Chapter \ref% {biv.hyp.chap}.}
While the estimated intercept parameter only rarely has theoretical significance, economists are generally more interested in the estimated value of the slope parameter. As noted above the estimated slope parameter serves a measure of the change in the predicted level in the dependent variable that results from a one-unit change in the level of the independent variable. In this case,
\begin{equation*}
\hat{\beta}_{1}=\frac{\Delta \text{\^{C}}_{t}}{\Delta \text{YD}_{t}} \end{equation*}
As noted above, this estimated slope parameter is equivalent to the Keynesian marginal propensity to consume. Thus, in this case, the estimated MPC (= $\hat{\beta}_{1}$) equals 0.9593. This result suggests that a \$1 billion increase in disposable income will increase predicted consumption spending by \$0.9593 billion.
\section{Coefficient of determination} Once you have estimated the parameters of a regression model, you will probably be interested in determining how well your model fits the observed data. One commonly used measure of “goodness of fit” is the \textbf{% coefficient of determination (R$^2$}\text{\textbf{)}}. Let’s examine this concept.
Any measure of “goodness of fit” must be based upon the relative magnitude of the estimated error terms $\hat{u}_i$. If the regression line fits the data well, the error terms will be small in magnitude. In this case, a large share of the variation in the dependent variable is explained by the variation in the independent variable. The error terms will be relatively large if the regression line explains a small proportion of the variation in the dependent variable. The coefficient of determination (or R$^2$) is a measure of the proportion of the total variation in the dependent variable that may be explained by variation in the independent variable through the regression relationship.
Let’s examine how R$^{2}$ may be computed. Equation \ref{sampleerror} may be rewritten as:
\begin{equation*}
Y_{i}=\hat{Y}_{i}+\hat{u}_{i}
\end{equation*}
Subtracting the sample mean from both sides of this expression results in: \begin{equation*}
Y_{i}-\overline{Y}=\hat{Y}_{i}-\overline{Y}+\hat{u}_{i} \end{equation*}
If we square both sides of this expression, we have: \begin{equation*}
\left( Y_{i}-\overline{Y}\right) ^{2}=y_{i}^{2}=\left( \hat{Y}_{i}-\overline{% Y}\right) ^{2}+\hat{u}_{i}^{2}+2\left( \hat{Y}_{i}-\overline{Y}\right) \hat{u% }_{i}
\end{equation*}
Therefore the sum of the squared deviations of $Y_{i}$ from its mean ($\sum y_{i}^{2}$) equals:
\begin{equation}
\sum y_{i}^{2}=\sum \left( \hat{Y}_{i}-\overline{Y}\right) ^{2}+\sum \hat{u}% _{i}^{2}+2\sum \left( \hat{Y}_{i}-\overline{Y}\right) u_{i} \label{SST} \end{equation}
Since $\hat{Y}_{i}=\hat{\beta}_{o}+\hat{\beta}_{1}X_{i}$ and $\overline{Y}% _{i}=\hat{\beta}_{o}+\hat{\beta}_{1}\overline{X}$, the third summation on the right-hand side of equation \ref{SST} can be expressed as: \begin{equation*}
2\sum \left( \hat{Y}_{i}-\overline{Y}\right) u_{i}=2\sum \hat{\beta}% _{1}(X_{i}-\overline{X})u_{i}
\end{equation*}
\begin{equation*}
=2\hat{\beta}_{1}\sum (X_{i}-\overline{X})u_{i} \end{equation*}
\begin{equation*}
=2\hat{\beta}_{1}\left( \sum X_{i}u_{i}-\overline{X}\sum u_{i}\right) \end{equation*}
Using equations \ref{norm1} and \ref{norm2}, it can be seen that this expression equals zero.
Thus, $\sum y_{i}$ can be expressed as: \begin{equation}
\sum y_{i}{}^{2}=\sum \left( \hat{Y}_{i}-\overline{Y}\right) ^{2}+\sum \hat{u% }_{i}^{2} \label{SST1}
\end{equation}%
In equation \ref{SST1}, the sum of squared deviations of $Y_{i}$ from its sample mean ($\sum y_{i}{}^{2}$) is called the \textbf{total sum of squares} or \textbf{TSS}. As indicated by Figure~\ref{decom_ybar}, each of the terms in this summation involves the squared value of the vertical distance between the observed value of $Y_{i}$ and $\overline{Y}$. The first term on the right-hand side of equation \ref{SST1} is the sum of the squared differences between the predicted value of $Y_{i}$ and $\overline{Y}$. Each of the terms in this summation represents the portion of the deviation of $% Y_{i}$ from its mean that can be explained by the regression line. As Figure~% \ref{decom_ybar} illustrates, this explained deviation is equal to the vertical difference between $\hat{Y}_{i}$ and $\overline{Y}$. Thus, the first summation on the right-hand side of equation \ref{SST1} is referred to as the\textbf{\ regression sum of squares} or \textbf{RSS}. The second summation on the right-hand side of Equation \ref{SST1} is called the \textbf{error sum of squares} (or \textbf{ESS}).\footnote{% Some statistics and econometrics text use a different notation in which the terms RSS and ESS are defined in a reverse manner. In this alternative notation, the regression sum of squares is called the explained sum of squares (ESS) while the error sum of squares is called the residual sum of squares (RSS). It is somewhat unfortunate that a standardized notation has not yet been adopted for these concepts. In reading econometric or statistical papers or texts that refer to the RSS and ESS, a careful reader should always determine from the context which convention is adopted for these terms.} Each of these squared error terms represents the portion of the deviation of $Y_{i}$ from its mean than cannot be explained by the regression relationship. The squared value of each of these error terms equals the squared value of the vertical difference between the observed $% Y_{i}$ and the value predicted by the regression equation ($\hat{u}% _{i}=Y_{i}-\hat{Y}_{i}$).
\begin{center}
\FRAME{ftbpFU}{4.4996in}{2.9291in}{0pt}{\Qcb{Decomposition of $Y_{i}-\bar{Y}$% }}{\Qlb{decom_ybar}}{fig4-10.gif}{\special{language “Scientific Word”;type “GRAPHIC”;maintain-aspect-ratio TRUE;display “USEDEF”;valid_file “F”;width 4.4996in;height 2.9291in;depth 0pt;original-width 4.4477in;original-height 2.885in;cropleft “0”;croptop “1”;cropright “1”;cropbottom “0”;filename ‘graphs/Fig4-10.gif’;file-properties “XNPEU”;}} \end{center}
These relationships can be summarized as: \begin{equation*}
\text{TSS}=\sum y_{i}{}^{2}
\end{equation*}%
\begin{equation*}
\text{RSS}=\sum \left( \hat{Y}_{i}-\overline{Y}\right) ^{2} \end{equation*}%
\begin{equation*}
\text{ESS}=\sum \hat{u}_{i}^{2}
\end{equation*}%
Using these definitions, equation \ref{SST1} may be restated as: \begin{equation}
\text{TSS = RSS + ESS} \label{SST2}
\end{equation}%
As Figure~\ref{decom_ybar} indicates, the total deviation of $Y_{i}$ from its mean can always be decomposed into the sum of the explained and the unexplained deviation.
The coefficient of determination (R$^{2}$) is defined as: \begin{equation}
\text{R}^{2}\text{ = }\frac{\text{RSS}}{\text{TSS}} \label{Rsquared} \end{equation}
As a result of the relationship existing in equation \ref{SST2}, we can observe that:
\begin{equation*}
\frac{\text{TSS}}{\text{TSS}}=\frac{\text{RSS}}{\text{TSS}}+\frac{\text{ESS}% }{\text{TSS}}
\end{equation*}
or
\begin{equation*}
1=\frac{\text{RSS}}{\text{TSS}}+\frac{\text{ESS}}{\text{TSS}} \end{equation*}
Thus, we can note that R$^{2}$ also equals: \begin{equation*}
\text{R}^{2}\text{ = 1 }-\text{ }\frac{\text{ESS}}{\text{TSS}} \end{equation*}
From these definitions, we can easily determine two properties of R$^2$ (the proof is left to the reader):
\begin{enumerate}
\item R$^2$ is always greater than or equal to zero.
\item R$^2$ is always less than or equal to one.
\end{enumerate}
As noted above, the coefficient of determination is a measure of the proportion of the variation in the dependent variable that can be explained by the variation in the independent variable. If all of the data falls exactly along the regression line (as occurs in Figure~\ref{r2_eq_1}), then all of the variation in $Y_{i}$ is caused by changes in the level of $X_{i}$% . In this case, RSS = TSS and the coefficient of determination (R$^{2}$) equals one. An R$^{2}$ equal to one implies that variation in the independent variable explains all of the variation in the dependent variable. On the other hand, if there is no relationship between $Y_{i}$ and $X_{i}$ (as illustrated in Figure~\ref{r2_eq_0}), all of the variation in $% Y_{i}$ is due to random components and TSS = ESS (and RSS=0). When this occurs, the coefficient of determination equals zero. An R$^{2}$ value of zero implies that variation in the independent variable explains none of the variation in the dependent variable.
\begin{center}
\FRAME{ftbpFU}{4.9389in}{3.2949in}{0pt}{\Qcb{R$^{2}=1$}}{\Qlb{r2_eq_1}}{% fig4-11.gif}{\special{language “Scientific Word”;type “GRAPHIC”;maintain-aspect-ratio TRUE;display “USEDEF”;valid_file “F”;width 4.9389in;height 3.2949in;depth 0pt;original-width 4.8853in;original-height 3.25in;cropleft “0”;croptop “1”;cropright “1”;cropbottom “0”;filename ‘graphs/Fig4-11.gif’;file-properties “XNPEU”;}}\FRAME{ftbpFU}{5.047in}{% 3.1012in}{0pt}{\Qcb{R$^{2}=0$}}{\Qlb{r2_eq_0}}{fig4-12.gif}{\special% {language “Scientific Word”;type “GRAPHIC”;maintain-aspect-ratio TRUE;display “USEDEF”;valid_file “F”;width 5.047in;height 3.1012in;depth 0pt;original-width 5.271in;original-height 3.2292in;cropleft “0”;croptop “1”;cropright “1”;cropbottom “0”;filename ‘graphs/Fig4-12.gif’;file-properties “XNPEU”;}} \end{center}
From this discussion, an observant reader may observe that the discussion of R$^2$ seems similar to our earlier discussion of the correlation coefficient ($\rho $) in Chapter \ref{stat.chap}. There is a good reason for this similarity. In fact, R$^2$ is simply the square of the sample correlation coefficient relating $Y$ and $\hat{Y}$. In other words, \begin{equation*}
\text{R}^2=\hat{\rho}_{Y,\hat{Y}}^2
\end{equation*}
A proof of this proposition appears in the mathematical appendix at the end of this chapter.
\subsection{Alternative method of computing R$^2$} In practice, it is often convenient to compute R$^2$ using a variation of the formula above. From equation \ref{Rsquared}, we have: \begin{equation*}
\text{R}^2\text{ = }\frac{\text{RSS}}{\text{TSS}} \end{equation*}
Expressed in deviation form, this can be stated as: \begin{equation} \label{Rsq1}
\text{R}^2=\frac{\sum (\hat Y_i-\overline{Y})^2}{\sum y_i^2} \end{equation}
Using the definitions of $\hat Y_i$ and $\overline{Y}$ we can rewrite equation \ref{Rsq1} as:
\begin{equation} \label{Rsq2}
\text{R}^2=\frac{\hat \beta _1^2\sum x_i^2}{\sum y_i^2} \end{equation}
The proof is left to the reader.
\subsection{Computing R$^2$: The consumption function} We can use the data from table \ref{t3.3} to compute the R$^{2}$ for our estimate of the consumption function. Using equation \ref{Rsq2}, we have: \begin{equation*}
\text{R}^{2}=\frac{(0.9593)^{2}(56608878.4)}{52379422.4} \end{equation*}
\begin{equation*}
=0.995
\end{equation*}
An R$^{2}$ of 0.995 indicates that 99.5\% of the variation in consumption expenditures during this period can be explained by the changes in disposable personal income that occurred during these years. Thus, it appears that this estimated consumption function fits the observed data quite well.
\subsection{Example II: Infant mortality} Econometric analysis is often used to investigate important social and economic issues. Suppose, for example, that a researcher wishes to investigate the effect of nutrition on infant mortality. Using cross-sectional data from 120 countries,\footnote{% Source: \textit{World Development Report 1991}, (NY: Oxford University Press, 1991), Table 28, pp. 258-9. The data used for this estimation is found in Table \ref{mortal.dat} in Appendix \ref{data.appendix}. This data also appears in the file “mortal1.dat” on the data disk that accompanies this text.} the following equation can be estimated: \begin{equation*}
\widehat{\text{Mortality}}_i=314.51-0.088\ \text{Calories}_i \end{equation*}
\begin{center}
$
\begin{array}{cc}
\text{where:} & \text{Mortality}_i\text{ = infant mortality rate per 1,000 live births in 1965} \\
& \text{Calories}_i\text{ = daily calorie supply per capita in 1965}% \end{array}
$
\end{center}
This equation suggests that an increase in average food consumption of 100 calories per day would reduce expected infant mortality by approximately 6 infants per 1,000 live births. The R$^{2}$ for this equation is 0.59. This suggests that approximately 59\% of the variation in infant mortality rates across countries may be explained by differences in nutrition during this time period.
\subsection{R$^{2}$ and the intercept} Earlier in this chapter, it was argued that an intercept term should always be included in a regression equation unless there is a compelling theoretical reason to omit the intercept. Another factor that you should be aware of is that the conventional measure of R$^{2}$ (as defined in equation % \ref{Rsquared}) is no longer appropriate when the regression equation is estimated without a constant term. The reason for this is that: \begin{equation*}
\text{TSS}\neq \text{RSS}+\text{ESS}
\end{equation*}
when a constant term is omitted from the regression equation. Thus, it is inappropriate to consider R$^{2}$ (=$\frac{\text{RSS}}{\text{TSS}}$) as a measure of the proportion of the variation of the dependent variable that is explained by the regression equation.
The simplest way to avoid this problem is to include a constant term in the regression equation. If regression results are reported for an equation in which the constant term is omitted, do not report the conventionally computed R$^{2}$ as part of your results.
\subsection{Cautions in interpreting R$^2\label{R2cautions}$} Beginning econometric students (and even some practicing econometricians) often put a bit too much emphasis on the magnitude of R$^2$. While a larger R% $^2$ is generally desirable, there are a few points that you should keep in mind:
\begin{itemize}
\item R$^{2}$ is not a statistic that can be directly used for hypothesis testing.\footnote{%
As will be shown in Chapter \ref{biv.hyp.chap}, however, the R$^{2}$ statistic can be transformed into an $F$-statistic that can be used to test the “goodness of fit” of a model (as long as information on the number of observations and the number of variables in the equation is available in addition to the R$^{2}$).} One of the major purposes of econometric analysis is to test economic hypotheses involving relationships among the variables.
These tests involve determining whether the sign and magnitude of the coefficients fall within the boundaries predicted by economic theory. For example, testing the hypothesis that the MPC equals 0.9 does not depend upon the R$^{2}$ value. (The procedure for testing hypotheses of this sort is discussed in Chapter \ref{biv.hyp.chap}.) \item If there is a large random component in the process generating the data, R$^2$ will be expected to be low even if you have correctly specified the model. For example, earnings equations that relate the level of earnings to the level of educational attainment (and other variables) are often estimated using cross-sectional data. These equations will generally have an R$^2$ that lies somewhere between 0.10 and 0.25. This does not necessarily mean that the equations are incorrectly specified. It is quite possible that there are many unobservable factors (such as differences in motivation, ability, and tastes) that affect an individual’s earnings. If these unobservable factors explain 85\% of the variation in earnings, the best R$% ^2 $ that can be attained is 0.15. As noted above, we are generally more concerned with the statistical significance of the estimated parameters $% \hat{\beta}_o$ and $\hat{\beta}_1$ than we are with the magnitude of R$^2$.
\item R$^2$ is a measure of the degree of linear association between the dependent and independent variables. It is not a measure of causation. A high R$^2$ may occur even if a model is incorrectly specified. In Chapter % \ref{intro.chap}, we discussed a regression equation in which the number of annual deaths in the U.S. was the dependent variable and the number of secondary school teachers was the independent variable. In this regression, the R$^2$ was 0.82. A simplistic interpretation of this result suggests that 82\% of the variation in deaths is explained by changes in the number of secondary school teachers. As noted in Chapter \ref{intro.chap}, however, it is likely that there is no causal relationship between these variables.
Instead, both of these variables have increased over time in response to changes in the U.S. population. Statistical procedures can only measure correlation, not causation.\footnote{% At this point, it should be noted that econometricians sometimes use a “causality test” in an attempt to investigate causal relationships among time-series variables. This test will be discussed in Chapter 15. At this point, we can simply note that these causality tests are based upon an examination of the correlations that occur between past outcomes of one time series and current values of another.} \item A large R$^{2}$ will often be found in macroeconomic time-series models as a result of the strong trend and autocorrelation components that are present in most macroeconomic time-series data. (It is likely that these factors are partial explanations for the high R$^{2}$ found in the consumption function example above.) This topic will be discussed in more detail in a later chapter.
\end{itemize}
\section{Forecasting\label{forecast.biv.sec}} Once an econometric model is estimated, it may be used for forecasting purposes. For example, suppose that we believe that disposable income will be \$7500 billion in 2006. From the estimated consumption function above, we have:
\begin{equation*}
\text{\^{C}}_{t}=-252.3+0.9593\text{YD}_{t} \end{equation*}
Thus, we can predict that the level of consumption expenditure will be: \begin{equation*}
\hat{C}_{t}=-252.3+0.9593(\$7500\text{ billion}) \end{equation*}
\begin{equation*}
=\$6942.45\text{ billion}
\end{equation*}
Forecasts based upon regression analysis will be unbiased when the conditions of the classical regression model are satisfied. (The proof of this proposition is left to the reader.) When regression analysis is used to generate forecasts, the predicted outcome will not, in general, equal the actual outcome. This occurs because: \begin{itemize}
\item Our estimated sample regression function will not generally be equal to the population regression function (as discussed in Section \ref{SRF}).
Since our estimated slope and intercept parameters are not equal to the population parameters, this introduces a source of error into our estimates.
\item There is a random component in the actual process generating the data.
Even if we knew the population slope and intercept parameters we would not be able to forecast the error term $u_i$.
\end{itemize}
To determine the reliability of our forecasts, we need a measure of the variance of the prediction ($\sigma _{p}^{2}$). This is given by the formula:
\begin{equation*}
\sigma _{p}^{2}=\sigma ^{2}\left[ 1+\frac{1}{N}+\frac{\left( X_{N+1}-% \overline{X}\right) ^{2}}{\sum (X_{i}-\overline{X})^{2}}\right] \end{equation*}
Since $\sigma ^{2}$ is generally unknown, we can estimate the variance of the prediction by replacing $\sigma ^{2}$ with its estimated value:\footnote{% In this discussion, it is assumed that the goal is to predict a specific value of the dependent variable, $Y_{i}$. An alternative goal is to predict the conditional expectation of $Y$ for a given level of $X_{i}$. The predicted value of $E(Y|X_{i}$) is also given by the sample regression function:
\begin{equation*}
\widehat{E(Y|X_{i})}=\hat{\beta}_{o}+\hat{\beta}_{1}X_{i} \end{equation*}
Equation \ref{forecast_error_var.bc} provides the variance of the forecast error in predicting a specific value of the dependent variable. The forecast error in predicting $E(Y|X)$ is given by: \begin{equation*}
\hat{\sigma}_{E(Y|X)}^{2}=\hat{\sigma}^{2}\left[ \frac{1}{N}+\frac{\left( X_{N+1}-\overline{X}\right) ^{2}}{\sum \left( X_{i}-\overline{X}\right) ^{2}}% \right]
\end{equation*}
Note that the forecast error has a lower variance in predicting $E(Y|X_{i}$) than in predicting $Y_{i}$. This is because the error term $u_{i}$ affects $% Y_{i}$ but does not affect $E(Y|X_{i})$.} \begin{equation}
\hat{\sigma}_{p}^{2}=\hat{\sigma}^{2}\left[ 1+\frac{1}{N}+\frac{\left( X_{N+1}-\overline{X}\right) ^{2}}{\sum (X_{i}-\overline{X})^{2}}\right] \label{forecast_error_var.bc}
\end{equation}
A derivation of this variance is contained in the mathematical appendix at the end of this chapter.
Once the variance of the prediction is determined, it is possible to compute confidence intervals that allow us to estimate the accuracy of the forecasts. This topic will be discussed in Chapter \ref{biv.hyp.chap}. For now, we should note that our forecasts are more reliable (\textit{i.e., }$% \sigma _p^2$ is smaller) when:
\begin{itemize}
\item $\sigma ^2$ is smaller;
\item there are more observations; and \item the observed value of $X_{N+1}$ is relatively close to the sample mean for $X$ \textit{(i.e., }$\left( X_{N+1}-\overline{X}\right) ^2$ is small).
\end{itemize}
\textit{Ceteris paribus, }the data points tend to be more tightly clustered around the regression line when $\sigma ^2$ is smaller. Therefore, we would expect a more reliable forecast when the variance of the underlying error process is smaller. An increase in the number of observations allows us to (on average) construct more reliable estimates of the regression parameters.
This will tend to improve the accuracy of our forecasts. Thus, these first two points are relatively obvious. The third point, however, requires a bit more interpretation.
The sample regression function will always pass through the point ($% \overline{X},\overline{Y}$). In large samples, this point will tend to be fairly close to the corresponding point on the population regression function. Errors in estimating the slope parameter ($\hat{\beta}_{1}$) will not result in a significant source of prediction error in the neighborhood of the sample means. As we move further from this point, however, errors in estimating the slope parameter will result in progressively larger errors in our forecasts. Thus, the estimates become less reliable when we try to predict the value of $Y$ for levels of $X$ that lie at a greater distance from $\overline{X}$. This phenomena is illustrated in Figure~\ref{for_acc}.
In this diagram, you can observe that a small error in estimating the slope will generally result in progressively larger errors as the level of $X$ moves further from the sample mean ($\overline{X}$).\footnote{% To simplify the exposition, in this diagram the sample regression function intersects the population regression function at the point corresponding to the sample means for $X$ and $Y$.}
\begin{center}
\FRAME{ftbpFU}{4.6882in}{3.1912in}{0pt}{\Qcb{Forecast accuracy}}{\Qlb{for_acc% }}{fig4-13.gif}{\special{language “Scientific Word”;type “GRAPHIC”;maintain-aspect-ratio TRUE;display “USEDEF”;valid_file “F”;width 4.6882in;height 3.1912in;depth 0pt;original-width 4.6354in;original-height 3.1462in;cropleft “0”;croptop “1”;cropright “1”;cropbottom “0”;filename ‘graphs/Fig4-13.gif’;file-properties “XNPEU”;}} \end{center}
\section{Summary}
In this chapter, you have examined how a regression model may be used to analyze the relationship between a dependent variable and a single independent variable. The assumptions of the classical regression model were analyzed. These assumptions provide a set of “ideal conditions” that simplify regression analysis. In later chapters, we will examine the implications of relaxing each of these assumptions.
The estimators for the intercept and slope parameters were derived in this chapter. If the assumptions of the classical regression model are satisfied, these estimators are linear, unbiased, consistent, and BLUE. If it is also assumed that the error terms are normally distributed, the OLS estimators are efficient.
The variances of the intercept and slope estimators have also been derived in this chapter. These variances provide a measure of the reliability of the parameter estimates and are used in hypothesis tests involving the magnitude or sign of these coefficients. The covariance between the intercept and slope estimators provide a measure of the relationship that exists between errors in measuring each of these parameters.
A measure of the overall “fit” of the regression relationship is provided by the coefficient of correlation (R$^2$). R$^2$ provides a measure of the proportion of the variation in the dependent variable that may be explained by the regression relationship. It was noted, however, that R$^2$ is sometimes overemphasized by beginning econometricians.
The final portion of this chapter provided a discussion of the use of regression analysis for forecasting purposes. The variance of the forecast error was examined.
A sound understanding of the basic bivariate regression model is needed before you move on to Chapter \ref{biv.hyp.chap}. In this next chapter, we will examine how to perform hypothesis tests and to construct confidence intervals for our estimates and forecasts.
\section{Key Concepts}
bivariate regression model
$E(Y|X_i)$
population regression function
sample regression function
classical regression model
homoskedastic errors
heteroskedastic errors
autocorrelation
ordinary least squares (OLS) estimation normal equations
standard error of estimators
variance of $\hat \beta _o$
variance of $\hat \beta _1$
covariance between $\hat \beta _o$ and $\hat \beta _1$ linearity of estimators
unbiasedness of estimators
BLUE
consistency
efficiency
minimum variance unbiased estimators
total sum of squares (TSS)
error sum of squares (ESS)
regression sum of squares (RSS)
coefficient of determination (R$^2$)
variance of the prediction ($\sigma _p^2$) \newpage\
\section{Exercises and Problems}
\begin{enumerate}
\item In Figure~\ref{problem_graph}, determine whether there are any obvious violations of the assumptions of the classical regression model. Identify any problems that are apparent. \FRAME{ftbpFU}{5.047in}{4.5005in}{0pt}{\Qcb{% Graphs for exercise 1.}}{\Qlb{problem_graph}}{fig4-14.gif}{\special{language “Scientific Word”;type “GRAPHIC”;maintain-aspect-ratio TRUE;display “USEDEF”;valid_file “F”;width 5.047in;height 4.5005in;depth 0pt;original-width 6.896in;original-height 6.1462in;cropleft “0”;croptop “1”;cropright “1”;cropbottom “0”;filename ‘graphs/Fig4-14.gif’;file-properties “XNPEU”;}} \item F. J. Anscombe suggests that an examination of scatterplots can often provide useful information about the nature of the relationship that may not be obvious if a researcher only examined regression output. This point is illustrated through the use of 4 regressions computed from data constructed by Anscombe. This data appears in the file “anscombe.dat.”
\begin{enumerate}
\item Use this data to estimate each of the following regression equations: \begin{equation*}
\text{Y1}_{i}=\beta _{o}+\beta _{1}\text{X1}_{i}+u_{i} \end{equation*}%
\begin{equation*}
\text{Y2}_{i}=\beta _{o}+\beta _{1}\text{X1}_{i}+u_{i} \end{equation*}%
\begin{equation*}
\text{Y3}_{i}=\beta _{o}+\beta _{1}\text{X1}_{i}+u_{i} \end{equation*}%
\begin{equation*}
\text{Y4}_{i}=\beta _{o}+\beta _{1}X4_{i}+u_{i} \end{equation*}
\item Use your regression software package to plot the scattergram corresponding to each of the above relationships.
\item Do any of the assumptions of the classical regression model appear to be violated in any of these regressions? Explain.
\item Do any of the data points appear to have a relatively large influence on the outcomes in any of these regressions? In which cases? If so, is it likely that such an effect would occur if a substantially larger sample were used?
\end{enumerate}
\item Show that the sample regression function will always pass through the point ($\overline{X},\overline{Y}$). (Hint: Use equation \ref{beta0.abz}.) \item Use the properties of summations appearing in the mathematical appendix at the end of Chapter \ref{stat.chap} to show that equations \ref% {weight.prop1.bc} and \ref{weight.prop2.bc} are correct.
\item Show that the OLS estimator $\hat{\beta}_{o}$ is linear. In other words, show that it can be written as: \begin{equation*}
\sum k_{i}Y_{i}
\end{equation*}
for some $k_{i}$. Determine the values of $k_{i}$ that satisfy this relationship.
\item Show that the OLS\ estimator $\hat{\beta}_{o}$ is unbiased.
\item The author collected data from his utility bills on the relationship between average daily temperature measured in degrees Fahrenheit (TEMP$)$ and average daily natural gas consumption (THERM) at his house in Oswego, N.Y. The file “heat.dat” contains the data for this analysis.
\begin{enumerate}
\item Use this data to estimate the parameters of the relationship:
\begin{equation*}
\text{THERM}_{i}=\beta _{o}+\beta _{1}\text{TEMP}_{i}+u_{i} \end{equation*}
\item Provide an interpretation of the estimated intercept.
\item Provide an interpretation of the estimated slope coefficient.
\end{enumerate}
\item Suppose that an econometrician estimates a demand curve for an imperfectly competitive firm’s product as: \begin{equation*}
\hat{Q}_{di}=20,420-500\text{Price}_{i} \end{equation*}
\begin{enumerate}
\item What is the economic significance of the estimated intercept parameter in this equation?
\item What is the economic significance of the estimated slope parameter?
\item According to this estimated demand curve, what quantity would be demanded if the price is \$10?
\item According to this estimated demand curve, what quantity would be demanded if the price is \$15?
\item What other variables might affect quantity demanded?
\end{enumerate}
\item Use a spreadsheet package and the data in the file “cons1.dat” to replicate the computation of the parameters of equation \ref% {est.cons.func.bc}.
\item The economics department at SUNY-Oswego was interested in determining the relationship that exists between student evaluations of faculty performance and the course grades that students receive. To investigate this issue, data was collected on two variables during the fall semester of 1995: an index derived from student evaluations of the faculty (EVAL) and the average course grade (GPA) assigned to students. A subset of this data appears in the file “eval.dat.” The EVAL variable is measured on a 5 point scale (5 = highest, 1= lowest) and the course GPA is measure on the traditional 4 point scale (F=0, D=1, C=2, B=3, A=4).
\begin{enumerate}
\item Examine the relationship between student evaluations of the faculty and student GPA by estimating the equation: \begin{equation*}
\text{EVAL}_{i}=\beta _{o}+\beta _{1}\text{GPA}_{i}+u_{i} \end{equation*}
using an OLS\ estimation procedure. (This may be done by calculator or by using a computer software package.) \item What does the estimated coefficient $\hat{\beta}_{1}$ measure? Is the estimated sign of this parameter consistent with your expectations?
\item If you wished to construct a more complete model of evaluations, what other factors might you include?
\end{enumerate}
\item An econometrician in the market research department of a widget manufacturing firm estimates the relationship: \begin{equation*}
\hat{Q}_{i}=50+8.25\text{ADS}_{i} \end{equation*}
\begin{equation*}
\begin{array}{ll}
\text{where:} & \hat{Q}_{i}=\text{widget sales in week }i \\ & \text{ADS}_{i}\text{ = number of newspaper ads placed by the firm in week }% i%
\end{array}%
\end{equation*}
\begin{enumerate}
\item What does this equation tell you about the additional sales resulting from an additional newspaper ad?
\item If the firm wishes to achieve a sales target of 83 units, how many ads should it place each week?
\end{enumerate}
\item An econometrician collects data from individuals on their annual income and total tax payments and uses this data to estimate the equation: \begin{equation*}
\text{Tax}_{i}\text{ = }\beta _{o}+\beta _{1}\text{Income}_{i}+u_{i} \end{equation*}
\begin{enumerate}
\item What is the economic significance of the intercept parameter ($\beta _{o}$)?
\item Suppose that $\beta _{o}$ is negative. What does this imply about the tax system?
\item What is the economic significance of the slope parameter ($\beta _{1}$% )?
\end{enumerate}
\item Use equation \ref{SST1} to show that 0 $\leq $ R$^{2}$ $\leq $ 1.
\item Use equation \ref{Rsq1} and the definitions of $\hat{Y}_{i}$ and $% \overline{Y}$ to show that R$^{2}$ equals $\hat{\beta}_{1}^{2}\sum x_{i}^{2}/\sum y_{i}^{2}$.
\item The economics club at Cliometrica State raises most of its funds through flower sales. To estimate the demand for flowers, club members have experimented with a number of different prices and have collected the data in table \ref{flower.1} below.
\begin{table}[tbp]
\begin{center}
\begin{tabular}{|cc|}
\hline
\textbf{Price} & \textbf{Quantity} \\ & \textbf{demanded} \\ \hline 0.20 & 200 \\
0.20 & 190 \\
0.20 & 160 \\
0.30 & 170 \\
0.30 & 180 \\
0.30 & 140 \\
0.40 & 150 \\
0.40 & 170 \\
0.40 & 135 \\
0.50 & 180 \\
0.50 & 130 \\
0.50 & 170 \\
0.60 & 190 \\
0.60 & 90 \\
0.60 & 110 \\
0.70 & 80 \\
0.70 & 105 \\
0.70 & 100 \\
0.80 & 80 \\
0.80 & 60 \\
0.80 & 55 \\
0.90 & 90 \\
0.90 & 70 \\
0.90 & 55 \\
1.00 & 50 \\
1.00 & 60 \\
1.00 & 55 \\
1.10 & 30 \\
1.10 & 25 \\
1.10 & 25 \\
1.20 & 15 \\
1.20 & 20 \\
1.20 & 20 \\ \hline
\end{tabular}%
\end{center}
\caption{Flower sale data } \label{flower.1}
\end{table}
(A copy of this data appears in the file “flower.dat” on the data disk accompanying this text.) \begin{enumerate}
\item Use regression analysis to estimate the parameters of a linear demand curve of the form:
\begin{equation*}
Q_{dt}=\beta _{o}+\beta _{1}\text{Price}_{t}+u_{t} \end{equation*}
\item Compute the variances for the estimators $\hat{\beta}_{o}$ and $\hat{% \beta}_{1}$.
\item Construct an estimate of $\sigma ^{2}$.
\item Determine the R$^{2}$ for this equation.
\item In a more complete specification, what other independent variables should be included.
\item What level of sales would be predicted if the price were set at \$0.55. What is the variance of this prediction at this point?
\end{enumerate}
\item Use the data from the file “lemonade.dat” to: \begin{enumerate}
\item verify the regression results reported in equation \ref{demand1} (% \textit{i.e.,} determine the estimated values of $\beta _{o}$ and $\beta _{1} $).
\item compute the standard errors for the estimated intercept and slope coefficients.
\item Compute R$^{2}$.
\end{enumerate}
\item Consider the following regression model:% \begin{equation*}
\text{Edspend}_{i}=\beta _{o}+\beta _{1}\text{Income}_{i}+u_{i} \end{equation*}%
\begin{equation*}
\begin{array}{clll}
\text{where:} & \text{Edspend}_{i}\text{ } & = & \text{per student expenditures in elementary and secondary } \\ & & & \text{schools in state }i \\ & \text{Income}_{i}\text{ } & = & \text{\textit{per capita} disposable personal income in state in 2000}% \end{array}%
\end{equation*}
\begin{enumerate}
\item What might be expected about the sign of $\beta _{1}$? Explain.
\item Estimate the parameters of this equation using the data contained in the file \textquotedblleft edspend.dat.\textquotedblright\ (A description of the data appears in Table \ref{edspend.dat} on p. \pageref{edspend.dat}.) \item Interpret the estimated R$^{2}$ for this equation. What does this statistic indicate about this regression relationship?
\end{enumerate}
\item Beginning in 1997, New York State publishes annual \textquotedblleft report cards\textquotedblright\ (available on the internet) that provide information on a variety of measures of the \textquotedblleft success\textquotedblright\ of elementary and secondary school systems. In examining the data for 1996, Riede (1997), a reporter for the Syracuse Newspapers, observed that a relationship appeared to exist between academic performance and the level of student poverty for the 124 schools in the central New York area. To evaluate this relationship, a regression model was formulated as:
\begin{equation}
\text{RD}_{i}\text{ = }\beta _{o}+\beta _{1}\text{LUN}_{i} \label{free.lunch}
\end{equation}%
\begin{equation*}
\begin{array}{lll}
\text{where:} & \text{RD}_{i}= & \text{proportion of 3rd grade students in school }i\text{ achieving} \\ & & \text{a\textquotedblleft mastery\textquotedblright\ level} \\ & \text{LUN}_{i}= & \text{proportion of students in school }i\text{ eligible for free} \\
& & \text{or reduced-price lunches}% \end{array}%
\end{equation*}%
The estimated equation, however, is not presented in the article.
\begin{enumerate}
\item Why might such a relationship exist? What sign might you predict for $% \beta _{1}$?
\item Use the data in the file \textquotedblleft schools.dat\textquotedblright\ to estimate the parameters of equation \ref% {free.lunch}.
\item What is the R$^{2}$ for this model? What does this suggest about the \textquotedblleft fit\textquotedblright\ of this equation?
\end{enumerate}
\item The file \textquotedblleft fac-sal.dat\textquotedblright\ contains data on the salaries and years of work experience (measured as years since completion of the Ph.D. degree) for 32 economists employed by the University of Michigan for the 1983-4 academic year. (Source: Frank (1984), p. 560.) \begin{enumerate}
\item Use this data to estimate the parameters of the regression model given by:
\begin{equation*}
\text{Salary}_{i}=\beta _{o}+\beta _{1}\text{experience}_{i}+u_{i} \end{equation*}
\item What is the economic meaning of the intercept parameter?
\item What is the economic meaning of the slope parameter?
\item What is the value of R$^{2}$ for this regression? What does this value indicate?
\end{enumerate}
\item The file “hwi.dat” contains data on the help-wanted index and the unemployment rate for 601 monthly observations for the years 1951 through the beginning of 2001.
\begin{enumerate}
\item Consider the regression model given by:\footnote{% This model was estimated using 24 quarterly observations by Gujarati (1968).} \begin{equation}
\text{HWI}_{t}=\beta _{o}+\beta _{1}\text{UN}_{t}+u_{t}\text{ } \label{hwi.br}
\end{equation}
Does the intercept term $\beta _{o}$ have a meaningful economic interpretation? What is the interpretation of $\beta _{1}$?
\item Use a regression software package to estimate the parameters of equation \ref{hwi.br}. What do these results suggest?
\item What is the value of R$^{2}$ for this regression? What does this value indicate?
\end{enumerate}
\item One of the major controversies in the 2000 Presidential election was the effect of the \textquotedblleft butterfly ballot\textquotedblright\ used in Palm Beach County. It was claimed that this ballot resulted in a substantial number of votes being mistakenly cast for Pat Buchanan instead of Al Gore. Consider the equation given by: \begin{equation}
\text{Buchanan}_{i}=\text{ }\beta _{o}+\beta _{1}\text{Total}_{i}\text{ + }% u_{i} \label{buchanan} \end{equation}
\begin{equation*}
\begin{array}{ll}
\text{where:} & \text{Buchanan}_{i}\text{ = number of votes cast for Pat Buchanan in Florida county }i \\ & \text{Total}_{i}\text{ = number of total votes cast for all candidates in Florida county }i \\
& u_{i\text{ }}\text{= random error term in county }i% \end{array}%
\end{equation*}
\begin{enumerate}
\item If Pat Buchanan received the same proportion of the vote in each county, what would $\beta _{1}$ measure?
\item Use a spreadsheet software package and the data contained in the file “florida.dat” to plot the observed relationship between Buchanan and total votes. Does the point corresponding to Palm Beach county appear to stand out from the rest?
\item Use the spreadsheet package or a statistical software package to estimate the parameters of equation \ref{buchanan}.
\end{enumerate}
\item The file \textquotedblleft cars.dat\textquotedblright\ contains information on a variety of characteristics of automobiles sold in 2002.
Consider the equation given by:% \begin{equation*}
\text{MSRP}_{i}=\beta _{o}+\beta _{1}\text{Horse}_{i}+u_{i} \end{equation*}%
\begin{equation*}
\begin{array}{ll}
\text{where:} & \text{MSRP}_{i}=\text{ manufacturer’s suggested retail price for car model }i\text{ in 2002} \\ & \text{Horse}_{i}=\text{ horsepower for car model }i \\ & u_{i}=\text{ random error term for observation }i% \end{array}%
\end{equation*}
\begin{enumerate}
\item What does $\beta _{1}$ measure in this equation? Do you expect the sign of this coefficient to be positive or negative? Explain.
\item Use the data in the file \textquotedblleft cars.dat\textquotedblright\ (this data is described in Table \ref{cars.dat} on p. \pageref{cars.dat}) to estimate the parameters of this equation. Are the results consistent with your expectations?
\end{enumerate}
\item Okun’s law suggests that a constant rate of economic growth is needed to maintain a constant unemployment rate. (For more details, see Okun (1962)). This law is based, in part, on the sample regression equation (Okun (1962), p. 99):
\begin{equation*}
\hat{Y}_{t}=0.30-.30X_{t} \end{equation*}%
where $Y_{t}$ is defined as the change in the unemployment rate (measured in percentage points) and $X_{t}$ is the quarterly percentage change in real GNP.
\begin{enumerate}
\item What is the economic meaning of the estimated intercept term in this equation?
\item What is the economic meaning of the estimated slope?
\end{enumerate}
\item As noted in the previous question, Arthur Okun suggested that the rate of growth in real GNP affects the unemployment rate according to the relationship:
\begin{equation}
Y_{t}=\beta _{o}+\beta _{1}X_{t}+u_{t} \label{okuns.prob.bc} \end{equation}%
where $Y_{t}$ is defined as the change in the unemployment rate (measured in percentage points) and $X_{t}$ is the quarterly percentage change in real GNP.
\begin{enumerate}
\item The file \textquotedblleft okun.dat\textquotedblright\ contain information on 217 observations on quarterly unemployment rates and real GNP (1st quarter, 1948-1st quarter, 2002). Use this data and a computer spreadsheet program or an econometrics software package to create the variables $X_{t}$ and $Y_{t}$ appearing in equation \ref{okuns.prob.bc}.
(This file is described in Table \ref{okuns.law.dat} on p. \pageref% {okuns.law.dat} in Appendix \ref{data.appendix}.) The appropriate transformations are:
\begin{equation*}
Y_{t}=\text{UN}_{t}-\text{UN}_{t-1} \end{equation*}%
\begin{equation*}
\text{where UN}_{t}\text{ = unemployment rate in period }t\text{, and } \end{equation*}%
\begin{equation*}
X_{t}=\frac{\text{GNP}_{t}-\text{GNP}_{t-1}}{\text{GNP}_{t-1}}\times 100 \end{equation*}%
\begin{equation*}
\text{where GNP}_{t}\text{ = real GNP in period }t \end{equation*}%
(Note that there are only 216 observations for $Y_{t}$ and $X_{t}$.) Use an OLS regression procedure to estimate the parameters of equation \ref% {okuns.prob.bc}.
\item Does your estimated slope coefficient appear similar to that estimated by Okun?
\item What is the R$^{2}$ for this estimated equation?
\end{enumerate}
\item A forecast is said to be unbiased if the expected value of the estimator equals the expected value of the variable being forecast. Show that regression analysis results in unbiased forecasts. (Hint: \textit{\ }% Show that $E(\hat{Y}_{N+1})=E(Y|X_{N+1})$ where: $\hat{Y}_{N+1}=\hat{\beta}% _{o}+\hat{\beta}_{1}X_{N+1}+\hat{u}_{N+1}$.) \end{enumerate}
\newpage\
\section{Mathematical Appendix} \subsection{Derivation of OLS estimators} The OLS estimates of the parameters $\beta _{o}$ and $\beta _{1}$ satisfy the following condition: \begin{equation*}
\underset{\hat{\beta}_{o,}\hat{\beta}_{1}}{\text{minimize}}\text{: }\sum \hat{u}_{i}^{2}
\end{equation*}%
This is equivalent to finding the values of $\hat{\beta}_{o}$ and $\hat{\beta% }_{1}$ that:
\begin{equation*}
\underset{\hat{\beta}_{o,}\hat{\beta}_{1}}{\text{minimize}}:f(\hat{\beta}% _{o},\hat{\beta}_{1})=\sum (Y_{i}-\hat{\beta}_{o}-\hat{\beta}_{1}X_{i})^{2} \end{equation*}%
The first-order conditions for this minimization problem are: \begin{equation}
\frac{\partial f}{\partial \hat{\beta}_{o}}=-2\sum (Y_{i}-\hat{\beta}_{o}-% \hat{\beta}_{1}X_{i})=0 \label{partial0} \end{equation}%
\begin{equation}
\frac{\partial f}{\partial \hat{\beta}_{1}}=-2\sum X_{i}(Y_{i}-\hat{\beta}% _{o}-\hat{\beta}_{1}X_{i})=0 \label{partial1} \end{equation}%
These equations provide two linear equations that may be solved for the values of $\hat{\beta}_{o}$ and $\hat{\beta}_{1}$. Equations \ref{partial0} and \ref{partial1} may be rewritten as: \begin{equation}
\sum (Y_{i}-\hat{\beta}_{o}-\hat{\beta}_{1}X_{i})=0 \label{partial0′} \end{equation}
and
\begin{equation} \label{partial1′} \sum X_i(Y_i-\hat \beta _o-\hat \beta _1X_i)=0 \end{equation}
These equations are identical to the normal equations \ref{norm1′} and \ref% {norm2′} discussed in the main body of this chapter. As shown on pp. \pageref% {normal.eq.mark.1}-\pageref{normal.eq.mark.2}, these equations can be solved for $\hat{\beta}_o$ and $\hat{\beta}_1$ as: \begin{equation}
\hat{\beta}_o=\overline{Y}-\hat{\beta}_1\overline{X} \label{b0} \end{equation}
and
\begin{equation}
\hat{\beta}_1=\frac{\sum (X_i-\overline{X})(Y_i-\overline{Y})}{\sum (X_i-% \overline{X})^2} \label{b1} \end{equation}
\subsection{Derivation of the variances and covariance of estimators} \subsubsection{Variance of $\hat \protect\beta _1$} Since $\hat \beta _1$ is an unbiased estimator, the variance of $\hat \beta _1$ equals:
\begin{equation*}
var(\hat \beta _1)=E(\hat \beta _1-\beta _1)^2 \end{equation*}
To determine the variance of $\hat{\beta}_1$, it is convenient to use equation \ref{weight.b1.bc} to express $\hat{\beta}_1$ as: \begin{equation}
\hat{\beta}_1=\sum w_iY_i \label{another.w.b1.bc} \end{equation}
\begin{equation*}
\text{where: }w_i=\frac{X_i-\overline{X}}{\sum \left( X_i-\overline{X}% \right) }
\end{equation*}
or in deviation form:
\begin{equation*}
w_i=\frac{x_i}{\sum x_i^2} \end{equation*}
Since:
\begin{equation*}
Y_i=\beta _o+\beta _1X_i+u_i \end{equation*}
equation \ref{another.w.b1.bc} can be restated as: \begin{equation*}
\hat{\beta}_1=\sum w_i\left( \beta _o+\beta _1X_i+u_i\right) \end{equation*}
Simplifying,
\begin{equation}
\hat{\beta}_1=\beta _o\sum w_i+\beta _1\sum w_iX_i+\sum w_iu_i \label{another.w2.bc}
\end{equation}
Since $\sum w_i=0$ and $\sum w_iX_i=1$ (the proof is left to the reader), equation \ref{another.w2.bc} becomes: \begin{equation*}
\hat{\beta}_1-\beta _1=\sum w_iu_i \end{equation*}
Using this result, the variance of $\hat{\beta}_1$ may be expressed as: \begin{equation*}
E(\hat{\beta}_1-\beta _1)^2=E(\sum_{i=1}^Nw_iu_i)^2 \end{equation*}
\begin{equation*}
=\sum_{i=1}^N\sum_{j=1}^Nw_iw_jE(u_iu_j) \end{equation*}
By Assumption 3.4, $E(u_iu_j)=0$ when $i\neq j$. Thus, \begin{equation*}
E(\hat{\beta}_1-\beta _1)^2=\sum_{i=1}^Nw_i^2E(u_i^2) \end{equation*}
By Assumption 3.3,
\begin{equation*}
E(\hat{\beta}_1-\beta _1)^2=\sum_{i=1}^Nw_i^2\sigma ^2 \end{equation*}
\begin{equation*}
=\sigma ^2\sum_{i=1}^Nw_i^2 \end{equation*}
But,
\begin{equation*}
\sum_{i=1}^Nw_i^2=\sum \left( \frac{x_i}{\sum x_i^2}\right) ^2 \end{equation*}
\begin{equation*}
=\frac{\sum x_i^2}{\left( \sum x_i^2\right) ^2} \end{equation*}
\begin{equation*}
=\frac 1{\sum x_i^2}
\end{equation*}
Thus,
\begin{equation*}
var(\hat \beta _1)=\sigma ^2\sum w_i^2 \end{equation*}
\begin{equation*}
=\frac{\sigma ^2}{\sum x_i^2} \end{equation*}
\subsubsection{Variance of $\hat \protect\beta _o$} The variance of $\hat{\beta}_o$ can be determined in a similar manner:% \footnote{%
This proof is based upon the method utilized by J. Johnston, \textit{{% Econometric Methods},} 3rd. ed., (N.Y.: McGraw-Hill, 1984), pp. 29-30.} \begin{equation}
var(\hat{\beta}_o)=E(\hat{\beta}_o-\beta _o)^2 \label{varb0} \end{equation}
From equation \ref{b0}, \begin{equation*}
\hat{\beta}_o=\overline{Y}-\hat{\beta}_1\overline{X} \end{equation*}
Since $Y_i=\beta _o+\beta _1X_i+u_i$, we can compute $\frac 1N\sum Y_i=% \overline{Y}$ as:
\begin{equation*}
\overline{Y}=\beta _o+\beta _1\overline{X}+\overline{u}_i \end{equation*}
where $\overline{u}_i$ is the sample mean for the unobserved true error term. Thus, we can write $\hat{\beta}_o$ as: \begin{equation*}
\hat{\beta}_o=\beta _o+\beta _1\overline{X}+\overline{u}_i-\hat{\beta}_1% \overline{X}
\end{equation*}
\begin{equation*}
=\beta _o-\overline{X}(\hat{\beta}_1-\beta _1)+\overline{u}_i \end{equation*}
Thus, we can express $\hat{\beta}_o-\beta _o$ as: \begin{equation*}
\hat{\beta}_o-\beta _o=-\overline{X}(\hat{\beta}_1-\beta _1)+\overline{u}_i \end{equation*}
Therefore Equation \ref{varb0} can be restated as: \begin{equation*}
var(\hat{\beta}_o)=E\left( \hat{\beta}_o-\beta _o\right) ^2 \end{equation*}
\begin{equation*}
=E\left[ -\overline{X}(\hat{\beta}_1-\beta _1)+\overline{u}_i\right] ^2 \end{equation*}
\begin{equation*}
=E\left[ \overline{X}^2(\hat{\beta}_1-\beta _1)^2+\overline{u}_i^2-2% \overline{X}(\hat{\beta}_1-\beta _1)\overline{u}_i\right] \end{equation*}
\begin{equation*}
=\overline{X}^2E(\hat{\beta}_1-\beta _1)^2+E(\overline{u}_i^2)+2\overline{X}% E(\hat{\beta}_1-\beta _1)\overline{u}_i \end{equation*}
\begin{equation*}
=\overline{X}^2var(\hat{\beta}_1)+E(\overline{u}_i^2)+2\overline{X}E(\hat{% \beta}_1-\beta _1)\overline{u}_i \end{equation*}
To simplify this expression, let’s expand the second and third terms: \begin{equation*}
E(\overline{u}_i^2)=E(\frac 1N\sum u_i)^2 \end{equation*}
\begin{equation*}
=\frac 1{N^2}\sum E(u_i^2) \end{equation*}
\begin{equation*}
=\frac 1{N^2}\sum \sigma ^2 \end{equation*}
\begin{equation*}
=\frac 1{N^2}(N\sigma ^2) \end{equation*}
\begin{equation*}
=\frac{\sigma ^2}N
\end{equation*}
Since
\begin{equation*}
\hat{\beta}_1-\beta _1=\sum w_iu_i \end{equation*}
where $w_i$ is defined above, \begin{equation*}
2\overline{X}E(\hat{\beta}_1-\beta _1)\overline{u}_i=2\overline{X}E\left[ \left( \sum w_iu_i\right) \left( \frac 1N\sum u_i\right) \right] \end{equation*}
\begin{equation*}
\frac{2\overline{X}}NE\left[ \left( \sum w_iu_i\right) \left( \sum u_i\right) \right]
\end{equation*}
Since $E(u_iu_j)=0$ when $i\neq j$, this can be rewritten as: \begin{equation*}
\frac{2\overline{X}}N\left( \sum_{i=1}^Nw_iE(u_i^2)\right) \end{equation*}
\begin{equation*}
\frac{2\overline{X}\sigma ^2}N\sum_iw_i=0 \end{equation*}
Therefore,
\begin{equation*}
var(\hat{\beta}_{o})=\overline{X}^{2}var(\hat{\beta}_{1})+\frac{\sigma ^{2}}{% N}
\end{equation*}
\begin{equation*}
=\overline{X}^{2}\frac{\sigma ^{2}}{\sum x_{i}^{2}}+\frac{\sigma ^{2}}{N} \end{equation*}
\begin{equation*}
=\sigma ^{2}\left( \frac{\overline{X}^{2}}{\sum x_{i}^{2}}+\frac{1}{N}\right) \end{equation*}
\subsubsection{Covariance between $\hat \protect\beta _o$ and $\hat \protect% \beta _1$}
The covariance between $\hat{\beta}_{o}$ and $\hat{\beta}_{1}$ is defined as:
\begin{equation*}
cov(\hat{\beta}_{o},\hat{\beta}_{1})=E\left[ (\hat{\beta}_{o}-\beta _{o})(% \hat{\beta}_{1}-\beta _{1})\right] \end{equation*}
As noted above,
\begin{equation*}
\hat{\beta}_{o}=\overline{Y}-\hat{\beta}_{1}\overline{X} \end{equation*}
and
\begin{equation*}
\beta _{o}=\overline{Y}-\beta _{1}\overline{X} \end{equation*}
Thus,
\begin{equation*}
\hat{\beta}_{o}-\beta _{o}=-\overline{X}(\hat{\beta}_{1}-\beta _{1}) \end{equation*}
The covariance can then be stated as: \begin{equation*}
cov(\hat{\beta}_{o},\hat{\beta}_{1})=E\{[-\overline{X}(\hat{\beta}_{1}-\beta _{1})(\hat{\beta}_{1}-\beta _{1})]\} \end{equation*}
\begin{equation*}
=\left( -\overline{X}\right) E(\hat{\beta}_{1}-\beta _{1})^{2} \end{equation*}
\begin{equation*}
=\left( -\overline{X}\right) var(\hat{\beta}_{1}) \end{equation*}
\begin{equation*}
=\frac{-\overline{X}\sigma ^{2}}{\sum x_{i}^{2}} \end{equation*}
\subsection{Proof of Gauss-Markov Theorem for $\hat \protect\beta _1$} The Gauss-Markov theorem states that when the conditions of the classical regression model are satisfied, the OLS\ estimator, $\hat \beta _1$, attains the lowest variance possible for a linear unbiased estimator. To prove this theorem, we will specify an arbitrary alternative estimator $\tilde \beta _1$ that is also linear and unbiased. The goal is to show that the variance of $% \tilde \beta _1$ is always greater than or equal to the variance of $\hat \beta _1$.
\textbf{Proof:}
Let $\tilde{\beta}_1$ be a linear unbiased estimator for the unknown parameter $\beta _1$. Since $\tilde{\beta}_1$ is linear, it can be written as a weighted summation of the $Y_i$. Without loss of generality, we can define $\tilde{\beta}_1$ as: \begin{equation*}
\tilde{\beta}_1=\sum k_iY_i \end{equation*}
Since $Y_i=\beta _o+\beta _1X_i+u_i$, this expression can be restated as: \begin{equation*}
\tilde{\beta}_1=\sum k_i\left( \beta _o+\beta _1X_i+u_i\right) \end{equation*}
This can also be expressed as: \begin{equation}
\tilde{\beta}_1=\beta _o\sum k_i+\beta _1\sum k_iX_i+\sum k_iu_i \label{gm.1.ic}
\end{equation}
Taking the expected value of both sides of this expression results in: \begin{equation}
E(\tilde{\beta}_1)=\beta _o\sum k_i+\beta _1\sum k_iX_i \label{gm.2.ic} \end{equation}
(since $E(u_i)=0$).
For $\tilde \beta _1$ to be unbiased, $E(\tilde \beta _1)=\beta _1$. An inspection of equation \ref{gm.2.ic} indicates that this will occur if: \begin{equation} \label{gm1} \sum k_i=0
\end{equation}
and
\begin{equation} \label{gm2} \sum k_iX_i=1
\end{equation}
Thus, for any linear, unbiased estimator $\tilde \beta _1$, equations \ref% {gm1} and \ref{gm2} must be satisfied.
The variance of $\tilde{\beta}_1$ is given by: \begin{equation*}
var(\tilde{\beta}_1)=E[\tilde{\beta}_1-E(\tilde{\beta}_1)]^2 \end{equation*}
Using equations \ref{gm.1.ic} and \ref{gm.2.ic}, this becomes: \begin{equation*}
var(\tilde{\beta}_1)=E(\sum k_iu_i)^2 \end{equation*}
Since the $k_i$ are constant and $E(u_iu_j)=0$ when $i\neq j$, the variance of $\tilde{\beta}_1$ simplifies to: \begin{equation*}
\tilde{\beta}_1=\sigma ^2\sum_{i=1}^Nk_i^2 \end{equation*}
\begin{equation*}
=\sigma ^2\sum \left[ \left( k_i-w_i\right) +w_i\right] ^2 \end{equation*}
\begin{equation*}
\text{where: }w_i=\frac{x_i}{\sum x_i^2} \end{equation*}
Thus,
\begin{equation*}
var(\tilde{\beta}_1)=\sigma ^2\sum \left( k_i-w_i\right) ^2+\sigma ^2\sum w_i^2+2\sigma ^2\sum \left( k_i-w_i\right) w_i \end{equation*}
\begin{equation*}
=\sigma ^2\sum \left( k_i-w_i\right) ^2+var(\hat{\beta}_1)+2\sigma ^2\sum \left( k_i-w_i\right) w_i \end{equation*}
Let’s consider the last term in this summation. Substituting in the definition of $w_i$, this term becomes: \begin{equation*}
2\sigma ^2\sum \left( k_i-\frac{x_i}{\sum x_i^2}\right) \left( \frac{x_i}{% \sum x_i^2}\right)
\end{equation*}
\begin{equation*}
=2\sigma ^2\left( \frac{\sum k_ix_i}{\sum x_i^2}-\frac{\sum x_i^2}{\left( \sum x_i^2\right) ^2}\right) \end{equation*}
\begin{equation*}
=\frac{2\sigma ^2}{\sum x_i^2}\left( \sum k_ix_i-1\right) \end{equation*}
\begin{equation*}
=\frac{2\sigma ^2}{\sum x_i^2}\left( \sum k_i(X_i-\overline{X})-1\right) \end{equation*}
\begin{equation*}
=\frac{2\sigma ^2}{\sum x_i^2}\left( \sum k_iX_i-\overline{X}\sum k_i-1\right)
\end{equation*}
For $\tilde{\beta}_1$ to be unbiased, however, $\sum k_iX_i=1$, and $\sum k_i=0$. Thus, this term becomes: \begin{equation*}
\frac{2\sigma ^2}{\sum x_i^2}(1-1)=0 \end{equation*}
So, the variance of $\tilde{\beta}_1$ is: \begin{equation*}
var(\tilde{\beta}_1)=\sigma ^2\sum \left( k_i-w_i\right) ^2+var(\hat{\beta}% _1)
\end{equation*}
Since the first term in this expression is always greater than or equal to zero, the variance of $\tilde{\beta}_1$ will always be greater than or equal to the variance of the OLS estimator $\hat{\beta}_1$.
\subsection{The relationship between R$^2$ and $\hat \protect\rho _{Y,\hat Y} $}
In the main body of this chapter, it was claimed that R$^2$ equals $\hat \rho _{Y,\hat Y}$. This section contains a proof of this proposition.
\textbf{Proof:}
Using the definition in equation \ref{scorcoef} in Chapter \ref{stat.chap}, the sample correlation coefficient between $Y$ and $\hat{Y}$ equals: \begin{equation*}
\hat{\rho}_{Y,\hat{Y}}=\frac{\hat{\sigma}_{Y,\hat{Y}}}{\hat{\sigma}_{Y}\hat{% \sigma}_{\hat{Y}}}
\end{equation*}
If we substitute the formulas for the estimated covariance and standard deviations, this becomes: \begin{equation}
\hat{\rho}_{Y,\hat{Y}}=\frac{\sum (Y_{i}-\overline{Y})(\hat{Y}_{i}-\overline{% Y})}{\sqrt{\sum (Y_{i}-\overline{Y})^{2}\sum (\hat{Y}_{i}-\overline{Y})^{2}}} \label{rhoyyhat}
\end{equation}
Using one of the properties of summations (Property 8 in the appendix to Chapter \ref{stat.chap}), the numerator of this expression can be rewritten as:
\begin{equation*}
\sum (Y_{i}-\overline{Y})(\hat{Y}_{i}-\overline{Y})=\sum (\hat{Y}_{i}-% \overline{Y})Y_{i}
\end{equation*}
A bit of algebraic manipulation is required: \begin{equation*}
\sum (\hat{Y}_{i}-\overline{Y})Y_{i}=\sum (\hat{Y}_{i}-\overline{Y})(\hat{Y}% _{i}+\hat{u}_{i})
\end{equation*}
\begin{equation*}
=\sum (\hat{Y}_{i}-\overline{Y})\hat{Y}_{i}+\sum (\hat{Y}_{i}-\overline{Y})% \hat{u}_{i}
\end{equation*}
\begin{equation*}
=\sum (\hat{Y}_{i}-\overline{Y})\hat{Y}_{i}+\sum \hat{Y}_{i}\hat{u}_{i}-% \overline{Y}\sum \hat{u}_{i} \end{equation*}
\begin{equation*}
=\sum (\hat{Y}_{i}-\overline{Y})\hat{Y}_{i}+\sum (\hat{\beta}_{o}+\hat{\beta}% _{1}X_{i})\hat{u}_{i}-\overline{Y}\sum \hat{u}_{i} \end{equation*}
\begin{equation*}
=\sum (\hat{Y}_{i}-\overline{Y})\hat{Y}_{i}+\hat{\beta}_{o}\sum \hat{u}_{i}+% \hat{\beta}_{1}\sum X_{i}\hat{u}_{i}-\overline{Y}\sum \hat{u}_{i} \end{equation*}
As noted above, however, under the OLS estimation technique, $\sum \hat{u}% _{i}=0$ and $\sum X_{i}\hat{u}_{i}=0$. Thus, the numerator of equation \ref% {rhoyyhat} can be restated as: \begin{equation*}
\sum (\hat{Y}_{i}-\overline{Y})\hat{Y}_{i} \end{equation*}
or
\begin{equation*}
\sum (\hat{Y}_{i}-\overline{Y})^{2} \end{equation*}
Therefore, equation \ref{rhoyyhat} can be restated as: \begin{equation*}
\hat{\rho}_{Y,\hat{Y}}=\frac{\sum (\hat{Y}_{i}-\overline{Y})^{2}}{\sqrt{\sum (Y_{i}-\overline{Y})^{2}\sum (\hat{Y}_{i}-\overline{Y})^{2}}} \end{equation*}
Using the definitions for ESS and TSS, we can rewrite this as: \begin{equation*}
\hat{\rho}_{Y,\hat{Y}}=\frac{\text{RSS}}{\sqrt{\text{TSS}}\sqrt{\text{RSS}}} \end{equation*}
\begin{equation*}
=\sqrt{\frac{\text{RSS}}{\text{TSS}}} \end{equation*}
Thus,
\begin{equation*}
\hat{\rho}_{Y,\hat{Y}}^{2}=\frac{\text{RSS}}{\text{TSS}}=\text{R}^{2} \end{equation*}
\subsection{Derivation of variance of the prediction ($\protect\sigma _p^2$)} The variance of the prediction is defined as:\footnote{% This proof is based upon the method used in Pindyck and Rubenfeld, \textit{{% Econometric Models and Economic Forecasts}, 4th. ed., (N.Y.: McGraw Hill, 1998), pp. 207-209.}}
\begin{equation}
\sigma _{p}^{2}=E(\hat{Y}_{N+1}-Y_{N+1})^{2} \label{varpred} \end{equation}
Since:
\begin{equation*}
\hat{Y}_{N+1}=\hat{\beta}_{o}+\hat{\beta}_{1}X_{N+1} \end{equation*}
and
\begin{equation*}
Y_{N+1}=\beta _{o}+\beta _{1}X_{N+1}+u_{N+1} \end{equation*}
equation \ref{varpred} may be restated as: \begin{equation*}
\sigma _{p}^{2}=E\left[ (\hat{\beta}_{o}-\beta _{o})+(\hat{\beta}_{1}-\beta _{1})X_{N+1}-u_{N+1}\right] ^{2} \end{equation*}
\begin{equation}
=E(\hat{\beta}_{o}-\beta _{o})^{2}+X_{N+1}^{2}E(\hat{\beta}_{1}-\beta _{1})^{2}+E(u_{N+1}^{2}) \label{varpred1} \end{equation}
\begin{equation*}
+2X_{N+1}E\left[ (\hat{\beta}_{o}-\beta _{o})(\hat{\beta}_{1}-\beta _{1})% \right]
\end{equation*}
\begin{equation*}
+2E\left[ (\hat{\beta}_{o}-\beta _{o})u_{N+1}\right] \end{equation*}
\begin{equation*}
+2X_{N+1}E\left[ (\hat{\beta}_{1}-\beta _{1})u_{N+1}\right] \end{equation*}
Since $\hat{\beta}_{o}$ and $\hat{\beta}_{1}$ are constructed using a linear combination of the first $N$ terms they are independent of the error term $% u_{N+1}$. ( $\hat{\beta}_{o}$ and $\hat{\beta}_{1}$ are random variables since they are equal to a linear combination of $u_{i}$, but $% E(u_{i}u_{j})=0 $ when $i\neq j$). Thus, the last two terms in equation \ref% {varpred1} equal zero.
Thus equation \ref{varpred1} can be restated as: \begin{equation*}
\sigma _{p}^{2}=var(\hat{\beta}_{o})+X_{N+1}^{2}var(\hat{\beta}% _{1})+2X_{N+1}cov(\hat{\beta}_{o},\hat{\beta}_{1})+var(u_{N+1}^{2}) \end{equation*}%
$=\sigma ^{2}\left( \frac{\overline{X}^{2}}{\sum x_{i}^{2}}+\frac{1}{N}% \right) +X_{N+1}^{2}\frac{\sigma ^{2}}{\sum x_{i}^{2}}+2X_{N+1}\left( \frac{-% \overline{X}\sigma ^{2}}{\sum x_{i}^{2}}\right) $% \begin{equation*}
=\sigma ^{2}\left( \frac{\overline{X}^{2}}{\sum x_{i}^{2}}+\frac{1}{N}+\frac{% X_{N+1}^{2}}{\sum x_{i}^{2}}+\frac{-2X_{N+1}\overline{X}}{\sum x_{i}^{2}}% +1\right)
\end{equation*}%
Combining all of the terms with the same denominator, this becomes: \begin{equation*}
\sigma _{p}^{2}=\sigma ^{2}\left( 1+\frac{1}{N}+\frac{X_{N+1}^{2}-2X_{N+1}% \overline{X}+\overline{X}^{2}}{\sum x_{i}^{2}}\right) \end{equation*}%
\begin{equation*}
=\sigma ^{2}\left[ 1+\frac{1}{N}+\frac{\left( X_{N+1}-\overline{X}\right) ^{2}}{\sum (X_{i}-\overline{X})^{2}}\right] \end{equation*}