Chapter 3 – Estimators and Distributions
\chapter{Estimators and Distributions\label{estimators.chap}} When econometricians attempt to quantify the relationships among sets of economic variables, they are not generally able to provide the exact values of the unknown parameters. Since there is always an element of randomness in the behavior of these variables, the true value and the estimated value of these parameters will, in general, differ. For example, estimates of the marginal propensity to consume will be different when alternative time periods are used for estimation purposes. In this chapter, you will examine how estimates of population parameters can be derived using sample data.
Since the estimated values, however, are not generally equal to the population parameters, it is necessary to examine the reliability of these estimated values. This requires a knowledge of the distribution of the estimators. Thus, this chapter also contains a brief examination of some of the probability density functions that are commonly used by econometricians.
\section{Estimators\label{estimators.sec}}
The mean and variance are parameters that are useful in characterizing the distribution of a random variable. The simple correlation coefficient is a parameter that provides us with a measure of the degree of linear association between two random variables. When we analyze the relationships that exist among economic variables, population parameters such as these can provide us with useful information about the relationships existing among a set of random variables. Unfortunately, these parameters are not known \textit{a priori}. Instead, they must be estimated from observable data.
Estimates of population parameters are constructed through the use of an \textbf{estimator,} a rule (or procedure) for constructing an estimate of a parameter using sample information. In the case of a random variable ($X)$, the sample mean ($\overline{X}$) serves as an estimator for the population mean ($\mu _X$). The sample mean is defined as:
\begin{equation*}
\overline{X}=\frac 1N\sum_{i=1}^NX_i
\end{equation*}
where $N$ is the size of the sample used to construct the estimate.
Suppose that you are trying to estimate the mean income in the U.S.
population. If you were to select 500 people randomly and ask them their incomes, you could construct an estimate of the sample mean (using the formula above). If you were to repeat this study with a different sample of 500 individuals, you would probably find a different sample mean. This is because the sample mean is itself a random variable. Since estimators are constructed from random variables, the estimates themselves are also random variables. A probability density function characterizes the distribution of each estimator. In Chapters \ref{biv.hyp.chap} and \ref{hyp.mult.chap}, you will examine the probability density functions for the estimated intercept and slope parameters in a regression model.
There are a number of desirable properties that an estimator may possess.
These properties include:
\begin{itemize}
\item linearity,
\item unbiasedness,
\item consistency, and
\item efficiency.
\end{itemize}
Let’s discuss each of these properties.
\subsection{Properties of Estimators}
\subsubsection{Linearity}
An estimator is said to be \textbf{linear} if it can be expressed as a linear function of observable random variables. Linear estimators can generally be computed with lower computational costs than nonlinear estimators. When regression analysis was performed using simple mechanical or electrical adding machines, this property was particularly important. The development of high speed and low cost computing technology has reduced the importance of the linearity property somewhat. It is still desirable, however, to rely on simpler linear estimation procedures when they perform as well as more complex nonlinear estimators.
\subsubsection{Unbiased estimator}
An estimator is \textbf{unbiased} if the expected value of the estimator equals the population parameter. Suppose that you have constructed an estimator, $\widehat{\theta }$, to provide an estimate of a population parameter $\theta .$ In mathematical terms, this estimator is unbiased if: \begin{equation*}
E(\widehat{\theta })=\theta .
\end{equation*}
If you were to use this estimator an infinite number of times, the average value of the parameter estimates will equal the true parameter value. Of course, any given estimate could be above or below the actual parameter value. In the case of an unbiased estimator, however, there is no tendency to either overestimate or underestimate the population parameter.
If an estimator is \textbf{biased}, the average value of the estimator will be either greater than or less than the true population parameter. Thus, an estimator is biased if:
\begin{equation*}
E(\widehat{\theta })=\theta +k,\text{ where }k\neq 0.
\end{equation*}%
The diagram on the left-hand side of Figure~\ref{bias_graph} illustrates the PDF for an unbiased estimator. A biased estimator is illustrated in the diagram on the right-hand side of this figure. In this case, the expected value of the estimator (E($\hat{\theta})$) exceeds the value of the population parameter ($\theta $).
\begin{center}
\FRAME{ftbpFU}{5.0462in}{1.9969in}{0pt}{\Qcb{Biased and unbiased estimators}% }{\Qlb{bias_graph}}{fig3-1.gif}{\special{language “Scientific Word”;type “GRAPHIC”;maintain-aspect-ratio TRUE;display “USEDEF”;valid_file “F”;width 5.0462in;height 1.9969in;depth 0pt;original-width 5.3125in;original-height 2.0833in;cropleft “0”;croptop “1”;cropright “1”;cropbottom “0”;filename ‘graphs/Fig3-1.gif’;file-properties “XNPEU”;}}
\end{center}
Let’s consider an analogy. Suppose that you visit a local rifle range and shoot at a traditional target formed of concentric circles. If your aim is unbiased, your shots will be randomly distributed about the center. If, however, your shots tend to cluster on one side of the center, your aim is biased. If an estimator is biased, the estimates will tend to be concentrated on one side of the true value.
\subsubsection{Consistency}
As noted above, an estimator is constructed using a sample of a given size.
An estimator is said to be \textbf{consistent} if the estimator converges to the value of the population parameter as the size of the sample rises. This concept is illustrated in Figure~\ref{cons_graph}. Suppose that $\widehat{% \theta }_{N}$ is a consistent estimator for the population parameter $\theta $ when a sample of size $N$ is utilized. As the size of the sample approaches infinity the distribution of this estimator converges to the single point $\theta $.\footnote{%
A more formal definition of consistency states that an estimator is consistent if:
\begin{equation*}
\underset{N\rightarrow \infty }{\text{lim}}\text{Prob}(\left| \hat{\theta}% _{N}-\theta \right| >\epsilon )=0\text{, for any }\epsilon >0.
\end{equation*}%
This condition suggests that as the sample size increases, the probability of an estimate that differs by $\epsilon $ from the true value approaches zero (and equals zero in the limit). Since this holds for all values of $% \epsilon $, this condition essentially requires that the distribution of the estimator converges to a single point (equal to the value of the population parameter) as the size of the sample approaches infinity. In other words, for any positive value of $\epsilon $ (no matter how small), the\ probability that the estimated value ($\widehat{\theta }_{N}$) falls in the interval between $\theta -\epsilon $ and $\theta +\epsilon $ converges to 100\% as $N$ tends toward infinity.}
\begin{center}
\FRAME{ftbpFU}{3.6625in}{3.1168in}{0pt}{\Qcb{Consistent estimator}}{\Qlb{% cons_graph}}{fig3-2.gif}{\special{language “Scientific Word”;type “GRAPHIC”;maintain-aspect-ratio TRUE;display “USEDEF”;valid_file “F”;width 3.6625in;height 3.1168in;depth 0pt;original-width 3.6149in;original-height 3.0727in;cropleft “0”;croptop “1”;cropright “1”;cropbottom “0”;filename ‘graphs/Fig3-2.gif’;file-properties “XNPEU”;}}
\end{center}
Be sure to keep in mind the distinction between the concepts of unbiasedness and consistency. An estimator is unbiased if the average value in an infinite \textit{number of samples} equals the population parameter. An estimator is consistent if it converges to the true value as the \textit{% size of the sample} approaches infinity. Unbiased estimators may be inconsistent. Consistent estimators may be biased.
While econometricians would prefer estimators that are both unbiased and consistent, it is not always possible to achieve both of these goals. As Figure~\ref{con_bias} illustrates, a consistent estimator that is biased may perform better than an unbiased estimator with a larger variance. In this example, a larger proportion of the estimates are expected to fall within any interval around the population mean under the consistent estimator. When dealing with large samples, the property of consistency is generally viewed as being more important than that of unbiasedness. Since the consistency property only tells us what happens as the size of the sample becomes relatively large, it is often called a “large-sample property.”
\begin{center}
\FRAME{ftbpFU}{3.9868in}{1.7979in}{0pt}{\Qcb{Consistency vs. bias}}{\Qlb{% con_bias}}{fig3-3.gif}{\special{language “Scientific Word”;type “GRAPHIC”;display “USEDEF”;valid_file “F”;width 3.9868in;height 1.7979in;depth 0pt;original-width 3.9271in;original-height 1.7608in;cropleft “0”;croptop “1”;cropright “1”;cropbottom “0”;filename ‘graphs/Fig3-3.gif’;file-properties “XNPEU”;}}
\end{center}
\subsubsection{Efficiency}
An unbiased estimator is said to be \textbf{efficient} if the variance of the estimator is less than or equal to the variance of any other unbiased estimator that can be formed using a sample of a given size. Suppose $\hat{% \theta}_{N}$ is an unbiased estimator of the parameter $\theta $ that is formed using a sample of $N$ observations. $\hat{\theta}_{N}$ is efficient if:
\begin{equation*}
var(\hat{\theta}_{N})\leq var(\tilde{\theta}_{N})\text{, where }\tilde{\theta% }_{N}\text{ is any other unbiased estimator.}
\end{equation*}%
and
\begin{equation*}
var(\hat{\theta}_{N})=E\left( \hat{\theta}-E(\hat{\theta})\right) ^{2} \end{equation*}%
This concept is illustrated in Figure~\ref{eff_graph}. For obvious reasons, an efficient estimator is also sometimes called a\textbf{\ minimum variance unbiased estimator}.
\FRAME{ftbpFU}{5.047in}{2.0237in}{0pt}{\Qcb{Efficient and inefficient estimators}}{\Qlb{eff_graph}}{fig3-4.gif}{\special{language “Scientific Word”;type “GRAPHIC”;maintain-aspect-ratio TRUE;display “USEDEF”;valid_file “F”;width 5.047in;height 2.0237in;depth 0pt;original-width 5.2399in;original-height 2.0833in;cropleft “0”;croptop “1”;cropright “1”;cropbottom “0”;filename ‘graphs/Fig3-4.gif’;file-properties “XNPEU”;}} Some texts define the concept of efficiency in terms of the \textbf{mean square error} of an estimator. The mean square error (MSE) of an estimator ($% \hat{\theta}$) is defined as:
\begin{equation*}
\text{MSE}(\hat{\theta})=E(\hat{\theta}-\theta )^{2} \end{equation*}
Under this alternative definition, mean square error efficiency occurs if the MSE of the estimator is less than that of any other estimator. This differs from the definition of efficiency appearing above in that the MSE equals the sum of the variance $\left( E\left( \hat{\theta}-E(\hat{\theta}% )\right) ^{2}\right) $and the square of the bias $\left( \left( E(\hat{\theta% })-\theta \right) ^{2}\right) $in the estimator.\footnote{% A proof may be found in a more advanced text. See, for example, Pindyck and Rubinfeld (1998), p. 30.} The concept of mean square error efficiency allows for a tradeoff between bias and variance. It is possible that the variance of a biased estimator may be less than the variance of unbiased estimators.
The mean squared error of an unbiased estimator is simply the variance of the estimator (since the bias equals zero in this case).
\subsection{Sample mean}
As noted above, the \textbf{sample mean}, $\overline{X}$, is defined as: \begin{equation*}
\overline{X}=\frac{1}{N}\sum_{i=1}^{N}X_{i}
\end{equation*}
The sample mean is an unbiased, consistent, and efficient estimator of the population mean.\footnote{%
Proof of the unbiasedness and consistency properties appear in the mathematical appendix at the end of this chapter. The proof of the efficiency property may be found in more advanced texts.} Since the estimator is unbiased, the average value of $\overline{X}$ across repeated samples tends toward $\mu _{X}$ as the number of samples increases. Since the estimator is consistent, the probability of an estimate of $\overline{X}$ that lies beyond a given distance from $\mu _{X}$ declines as the size of the sample increases. The efficiency property states that the variance of the sample mean is less than or equal to the variance of any other unbiased estimator of the population mean.
The variance of the sample mean equals:\footnote{%
To see this note, that:
\begin{equation*}
var(\bar{X})=var(\frac{1}{N}\sum_{i=1}^{N}X_{i})
\end{equation*}
Using property 2 of the variance of random variables (discussed in Chapter % \ref{stat.chap}), this can be rewritten as:
\begin{equation*}
var(\bar{X})=\left( \frac{1}{N}\right) ^{2}var(\sum_{i=1}^{N}X_{i}) \end{equation*}
Since the $X_{i}$’s are assumed to be independent, this can be expressed as: \begin{equation*}
var(\bar{X})=\left( \frac{1}{N}\right) ^{2}\sum_{i=1}^{N}var(X_{i}) \end{equation*}
\begin{equation*}
=\left( \frac{1}{N}\right) ^{2}\sum_{i=1}^{N}\sigma _{X}^{2} \end{equation*}
\begin{equation*}
=\left( \frac{1}{N}\right) ^{2}\left( N\sigma _{X}^{2}\right) \end{equation*}
\begin{equation*}
=\frac{\sigma _{X}^{2}}{N}
\end{equation*}%
}
\begin{equation*}
var(\overline{X})=\frac{\sigma _{X}^{2}}{N}
\end{equation*}
\subsection{Sample variance}
The \textbf{sample variance} for a random variable $X$ is defined as: \begin{equation} \label{sam.var.stc}
\hat \sigma _X^2=\frac 1{N-1}\sum_{i=1}^N(X_i-\overline{X})^2 \end{equation}
This can also be computed in an alternative form as: \begin{equation} \label{sam.var1.stc}
\hat \sigma _X^2=\frac 1{N-1}\left[ \left( \sum_{i=1}^NX_i^2\right) -N% \overline{X}^2\right]
\end{equation}
Note that the divisor is $N-1$ instead of $N.$ As shown in the mathematical appendix, this estimator is unbiased. In this case, $N-1$ is the \textbf{% degrees of freedom} for the estimator of the variance. The degrees of freedom for any estimator, in general, is equal to the number of observations minus the number of unknown parameters that must be estimated before the estimator can be computed. In this example, it is necessary to estimate the sample mean before the sample variance may be computed (using either equation \ref{sam.var.stc} or \ref{sam.var1.stc}). Since the estimation of the sample variance requires the estimation of one parameter ($% \overline{X}$), the degrees of freedom for this estimator equals $N-1$.
Roughly speaking, the degrees of freedom for any estimator is a measure of the number of independent bits of information that are used in constructing the estimator. Suppose, that you wished to construct the sample variance of some variable using equation \ref{sam.var.stc}. If there are $N$ observations and you know the sample mean and any $N-1$ observations, the other observation can be determined from this information. Consider a simple example in which there are three observations and you know that $X_1=2$, $% X_2=4$ and $\overline{X}=4$. To see how this information can be used to determine the value of the third observation, note that, by the definition of the sample mean:
\begin{equation*}
\overline{X}=\frac{\sum X_i}N
\end{equation*}
Substituting in the known values of $X_1$, $X_2$ and $\overline{X}$: \begin{equation*}
4=\frac{2+4+X_3}3
\end{equation*}
Simplifying,
\begin{equation*}
3(4)=6+X_3
\end{equation*}
Thus,
\begin{equation*}
X_3=6
\end{equation*}
The sum of squared deviations from the sample mean in equation \ref% {sam.var.stc} is divided by $N-1$ instead of $N$ because there are only $N-1$ independent deviations from the sample mean. Thus, there are only $N-1$ degrees of freedom for the sample variance estimator.
As in the case of the sample mean, the sample variance is a consistent and efficient estimator of $\sigma _X^2$.
\subsection{Sample standard deviation}
Since the standard deviation is defined as the square root of the variance, it seems reasonable to define the \textbf{sample standard deviation} ($\hat{% \sigma}_{X})$ as the square root of the sample variance. Thus: \begin{equation*}
\hat{\sigma}_{X}=\sqrt{\hat{\sigma}_{X}^{2}}
\end{equation*}
\begin{equation*}
=\sqrt{\left( \frac{1}{N-1}\right) \sum_{i=1}^{N}\left( X_{i}-\overline{X}% \right) ^{2}}
\end{equation*}
\subsection{Sample covariance and correlation}
The \textbf{sample covariance} is defined as:
\begin{equation*}
\widehat{cov}(X,Y)=\frac{1}{N-1}\sum_{i=1}^{N}(X_{i}-\overline{X})(Y_{i}-% \overline{Y})
\end{equation*}
An equivalent form of the covariance is:
\begin{equation*}
\widehat{cov}(X,Y)=\frac 1{N-1}\left[ \left( \sum_{i=1}^NX_iY_i\right) -N(% \overline{X}\overline{Y})\right]
\end{equation*}
This estimator is unbiased, consistent, and efficient.\footnote{% Proofs of these properties may be found in more advanced texts.} The degrees of freedom for the covariance estimator is, once again, equal to $N-1$.
The \textbf{sample correlation coefficient} for two random variables $X$ and $Y$ can be computed by dividing the sample covariance by the sample standard deviations of $X$ and $Y$. In mathematical terms:
\begin{equation} \label{scorcoef}
\hat \rho _{XY}=\frac{\hat \sigma _{XY}}{\hat \sigma _X\hat \sigma _Y} \end{equation}
\subsection{Example: Computation of descriptive statistics} At the start of an econometric study, econometricians often examine estimates of the mean, variance, standard deviation, covariance, and correlations among the variables that are used as part of an econometric study. The estimated values of these parameters are referred to as \textbf{% descriptive statistics} since they help to describe the marginal distributions of the variables that are being studied and the relationships that may exist among these variables. Let’s examine how these statistics are calculated.
%TCIMACRO{%
%\TeXButton{d.stat.tab.st}{\begin{table}[p]
%\begin{center}
%\begin{minipage}{4.5in}
%\renewcommand{\footnoterule}{}
%\begin{center}
%\caption{Descriptive statistics data: yields on short-term and long-term Treasury securities \label{d.stat.tab.st}} %\vspace{.1in}
%\begin{tabular}{lccccccc} \hline
%\bf Month & \boldmath $X_i$\footnote{Yield on 3-month Treasury securities in 1994. Source: {\it Economic Report of the President, 1995} Table B-72.} & \boldmath $x_i$\footnote{$x_i=X_i-\overline{X}$} & \boldmath $x_i^2$ & \boldmath $Y_i$\footnote{Yield on 10-year Treasury securities in 1994. Source: {\it Economic Report of the President, 1995} Table B-72.} & \boldmath $y_i$\footnote{$y_i=Y_i-\overline{Y}$} & \boldmath $y_i^2$ & $x_iy_i$ \\ \hline %Jan & 3.02 & -1.25 & 1.5625 & 5.75 & -1.33 & 1.7689 & 1.6625 \\ %Feb & 3.21 & -1.06 & 1.1236 & 5.97 & -1.11 & 1.2321 & 1.1766 \\ %Mar & 3.52 & -0.75 & 0.5625 & 6.48 & -0.6 & 0.36 & 0.45 \\ %Apr & 3.74 & -0.53 & 0.2809 & 6.97 & -0.11 & 0.0121 & 0.0583 \\ %May & 4.19 & -0.08 & 0.0064 & 7.18 & 0.1 & 0.01 & -0.008 \\ %Jun & 4.18 & -0.09 & 0.0081 & 7.10 & 0.02 & 0.0004 & -0.0018 \\ %Jul & 4.39 & 0.12 & 0.0144 & 7.30 & 0.22 & 0.0484 & 0.0264 \\ %Aug & 4.50 & 0.23 & 0.0529 & 7.24 & 0.16 & 0.0256 & 0.0368 \\ %Sep & 4.64 & 0.37 & 0.1369 & 7.46 & 0.38 & 0.1444 & 0.1406 \\ %Oct & 4.96 & 0.69 & 0.4761 & 7.74 & 0.66 & 0.4356 & 0.4554 \\ %Nov & 5.25 & 0.98 & 0.9604 & 7.96 & 0.88 & 0.7744 & 0.8624 \\ %Dec & 5.64 & 1.37 & 1.8769 & 7.81 & 0.73 & 0.5329 & 1.0001 \\ \hline %Sums: & $51.24$ & & $7.0616$ & $84.96$ & & $5.3448$ & $5.8593$ \\ \hline %\end{tabular}
%\end{center}
%\end{minipage}
%\end{center}
%\end{table}
%}}%
%BeginExpansion
\begin{table}[p]
\begin{center}
\begin{minipage}{4.5in}
\renewcommand{\footnoterule}{}
\begin{center}
\caption{Descriptive statistics data: yields on short-term and long-term Treasury securities \label{d.stat.tab.st}} \vspace{.1in}
\begin{tabular}{lccccccc} \hline
\bf Month & \boldmath $X_i$\footnote{Yield on 3-month Treasury securities in 1994. Source: {\it Economic Report of the President, 1995} Table B-72.} & \boldmath $x_i$\footnote{$x_i=X_i-\overline{X}$} & \boldmath $x_i^2$ & \boldmath $Y_i$\footnote{Yield on 10-year Treasury securities in 1994. Source: {\it Economic Report of the President, 1995} Table B-72.} & \boldmath $y_i$\footnote{$y_i=Y_i-\overline{Y}$} & \boldmath $y_i^2$ & $x_iy_i$ \\ \hline Jan & 3.02 & -1.25 & 1.5625 & 5.75 & -1.33 & 1.7689 & 1.6625 \\ Feb & 3.21 & -1.06 & 1.1236 & 5.97 & -1.11 & 1.2321 & 1.1766 \\ Mar & 3.52 & -0.75 & 0.5625 & 6.48 & -0.6 & 0.36 & 0.45 \\ Apr & 3.74 & -0.53 & 0.2809 & 6.97 & -0.11 & 0.0121 & 0.0583 \\ May & 4.19 & -0.08 & 0.0064 & 7.18 & 0.1 & 0.01 & -0.008 \\ Jun & 4.18 & -0.09 & 0.0081 & 7.10 & 0.02 & 0.0004 & -0.0018 \\ Jul & 4.39 & 0.12 & 0.0144 & 7.30 & 0.22 & 0.0484 & 0.0264 \\ Aug & 4.50 & 0.23 & 0.0529 & 7.24 & 0.16 & 0.0256 & 0.0368 \\ Sep & 4.64 & 0.37 & 0.1369 & 7.46 & 0.38 & 0.1444 & 0.1406 \\ Oct & 4.96 & 0.69 & 0.4761 & 7.74 & 0.66 & 0.4356 & 0.4554 \\ Nov & 5.25 & 0.98 & 0.9604 & 7.96 & 0.88 & 0.7744 & 0.8624 \\ Dec & 5.64 & 1.37 & 1.8769 & 7.81 & 0.73 & 0.5329 & 1.0001 \\ \hline Sums: & $51.24$ & & $7.0616$ & $84.96$ & & $5.3448$ & $5.8593$ \\ \hline \end{tabular}
\end{center}
\end{minipage}
\end{center}
\end{table}
%
%EndExpansion
Table \ref{d.stat.tab.st} contains information on the yields on 3-month and 10-year Treasury securities during 1994.\footnote{% The data used in this analysis can be found in the file “yields.dat” on the data disk that accompanies this text.} To simplify notation, the yield on 3-month Treasury bills is denoted by the variable $X$; the yield on 10-year Treasury bonds is denoted by the variable $Y$. Let’s examine the computation of the descriptive statistics for these variables.
The sample means of $X$ and $Y$ are computed as:
\begin{equation*}
\overline{X}=\frac{1}{N}\sum_{i=1}^{N}X_{i}
\end{equation*}
\begin{equation*}
=\frac{51.24\%}{12}
\end{equation*}
\begin{equation*}
=4.27\%
\end{equation*}
and
\begin{equation*}
\overline{Y}=\frac{1}{N}\sum_{i=1}^{N}Y_{i}
\end{equation*}
\begin{equation*}
=\frac{84.96\%}{12}
\end{equation*}
\begin{equation*}
=7.08\%
\end{equation*}
A comparison of the sample means for these two variables indicates that the yield on 10-year Treasury bonds was higher, on average, than the yield on 3-month Treasury bills.
The sample variance for $X$ and $Y$ can be determined as: \begin{equation*}
\hat \sigma _X^2=\frac 1{N-1}\sum_{i=1}^N\left( X_i-\overline{X}\right) ^2 \end{equation*}
\begin{equation*}
=\frac 1{11}\left( 7.0616\right)
\end{equation*}
\begin{equation*}
=0.64
\end{equation*}
and
\begin{equation*}
\hat \sigma _Y^2=\frac 1{N-1}\sum_{i=1}^N\left( Y_i-\overline{Y}\right) ^2 \end{equation*}
\begin{equation*}
=\frac 1{11}\left( 5.3448\right)
\end{equation*}
\begin{equation*}
=0.49
\end{equation*}
Thus, during this time period, the variance of the yield on 3-month Treasury bills was greater than the variance of the yield of 10-year Treasury bonds.
The sample standard deviations for $X$ and $Y$ are equal to the square roots of the variances for these variables. Thus,
\begin{equation*}
\hat \sigma _X=\sqrt{0.64}
\end{equation*}
\begin{equation*}
=0.80
\end{equation*}
and
\begin{equation*}
\hat \sigma _Y=\sqrt{0.49}
\end{equation*}
\begin{equation*}
=0.70
\end{equation*}
The sample covariance between $X$ and $Y$ equals:
\begin{equation*}
\widehat{cov}(X,Y)=\frac 1{N-1}\sum_{i=1}^N(X_i-\overline{X})(Y_i-\overline{Y% })
\end{equation*}
\begin{equation*}
=\frac 1{11}\left( 5.8593\right)
\end{equation*}
\begin{equation*}
=0.53
\end{equation*}
The sample correlation coefficient is computed as:
\begin{equation*}
\hat \rho _{XY}=\frac{\widehat{cov}(X,Y)}{\hat \sigma _X\hat \sigma _Y} \end{equation*}
\begin{equation*}
=\frac{0.53}{\left( 0.80\right) \left( 0.70\right) } \end{equation*}
\begin{equation*}
=0.95
\end{equation*}
The relatively large, and positive, sample correlation coefficient indicates that there is a strong linear association between the yield on these two types of bonds. During this period, increases in the yield on 3-month Treasury bills are generally associated with increases in the yield on 10-year Treasury bonds.
\section{Some Important Distributions}
This section of the chapter is designed to provide a brief overview of the distribution functions that are most commonly used in econometric analysis.
A more detailed discussion of how these distribution functions are actually applied in econometric analysis appears in Chapters \ref{biv.hyp.chap} and % \ref{hyp.mult.chap} (and subsequent chapters).
\subsection{Normal distribution}
The normal distribution is one of the most commonly used distributions in econometric analysis.\footnote{%
For those who are mathematically inclined, the formula for the PDF of a normal distribution is:
\begin{equation*}
f(x)=\frac{1}{\sigma \sqrt{2\pi }}e^{-\frac{1}{2}(\frac{X-\mu }{\sigma }% )^{2}},
\end{equation*}%
where $e=2.718…,\pi =3.14159…,$ and $\mu $ and $\sigma $ are the mean and standard deviation, respectively, of the distribution.} Figure~\ref% {pdf_normal} contains a graph of a normal density function. One reason for the popularity of the normal density function is the existence of a large number of \textbf{central limit theorems} that indicate that the PDF for a wide variety of random variables tend towards a normal PDF as the size of the sample increases. In introductory statistics courses, the central limit theorem is used to justify treating the distribution of the sample mean ($% \overline{X}$) as a normal variable even though the distribution of the underlying variable $X_{i}$ is not necessarily normal. As long as the $X_{i}$ represent independent drawings from a population with a constant mean and variance, the central limit theorem indicates that the distribution of the sample mean converges to a normal distribution as the size of the sample increases.
\FRAME{ftbpFU}{5.047in}{2.1759in}{0pt}{\Qcb{PDF for a normally distributed random variable}}{\Qlb{pdf_normal}}{fig3-5.gif}{\special{language “Scientific Word”;type “GRAPHIC”;maintain-aspect-ratio TRUE;display “USEDEF”;valid_file “F”;width 5.047in;height 2.1759in;depth 0pt;original-width 5.9378in;original-height 2.5417in;cropleft “0”;croptop “1”;cropright “1”;cropbottom “0”;filename
‘graphs/Fig3-5.gif’;file-properties “XNPEU”;}}
As noted in Chapter \ref{intro.chap}, one of the purposes of econometric analysis is to estimate the parameters of regression equations. In Chapter % \ref{biv.reg.chap}, it will be shown that the estimators for these regression parameters are random variables that are functions of the observed outcomes. Central limit theorems are used by econometricians to argue that the distribution of these parameter estimators converge to a normal distribution as the sample size rises. This provides a rationale for using the normal density function to conduct hypothesis tests when the sample is sufficiently large.
When econometricians work with small samples, the central limit theorem cannot be used to justify the use of the normal distribution function.% \footnote{%
A careful reader will note the somewhat loose use of the term “sufficiently large.” Unfortunately, no simple rules exist that determine the sample size that is required for the distribution function to be approximately a normal density function. This depends on the underlying density function for the observed random variables. Econometricians generally agree that an appeal to the central limit theorem is reasonable for hypothesis testing purposes when there are two hundred or more observations; it is also generally agreed that such an appeal is unreasonable when there are fewer than 30 observations.
When sample sizes fall between these intervals, there is less agreement about whether the samples are “large enough” to justify an appeal to the central limit theorem.} In such cases, econometricians frequently begin with the assumption that the random error term in a regression equation follows a normal distribution. As will be shown in later chapters, the assumption of normally distributed random error terms guarantees that the intercept and slope parameters will also follow a normal density function.\footnote{% Tests developed by Kiefer and Salmon (1983) and Jarque and Bera (1987) may be used to test for the normality of random error terms in cases where there is some reason to doubt the validity of this assumption. The Jarque-Berra test is discussed in Section \ref{Jarque_Bera} on pp. \pageref{Jarque_Bera} – \pageref{Jarque_Bera.end}.} Since the normal distribution is so important in econometric practice (and will be extensively referred to in later chapters), a review of some of the important features of the normal density function is in order.
Suppose that a variable ($X)$ is normally distributed with a mean of $\mu _X$ and a variance of $\sigma _X^2$. This is expressed mathematically as: \begin{equation*}
X\sim N(\mu _X,\sigma ^2)
\end{equation*}
In this equation, the symbol “$\sim $” means “is distributed as.” This equation is read as: “$X$ is distributed normally with a mean of $\mu _X$ and a variance of $\sigma _X^2$.
An interesting, and useful, property of normal distributions is that any weighted sum of normally distributed random variables will also be a normally distributed random variable. For example, if $X$ and $Y$ are normally distributed random variables, a new variable equal to $3X-5Y$ will also be normally distributed.
\subsubsection{Standard normal density — N(0,1)}
A \textbf{standard normal density function} is a special case of a normal distribution in which the population mean equals zero and the population variance (and standard deviation) equals one. In other words, a random variable $Z$ is distributed as a standard normal variable if $Z\sim N(0,1)$.
Any normal variable, $X$, can be transformed into a standard normal variable by using the equation:\footnote{%
A proof of this property is left to the reader as one of the end-of-chapter problems.}
\begin{equation}
Z=\frac{X-\mu _{X}}{\sigma _{X}} \label{Z-score_1.sc} \end{equation}
\begin{center}
\FRAME{ftbpFU}{5.047in}{2.1681in}{0pt}{\Qcb{PDF for a standard normal variable}}{\Qlb{pdf_std_normal}}{fig3-6.gif}{\special{language “Scientific Word”;type “GRAPHIC”;maintain-aspect-ratio TRUE;display “USEDEF”;valid_file “F”;width 5.047in;height 2.1681in;depth 0pt;original-width 5.9586in;original-height 2.5417in;cropleft “0”;croptop “1”;cropright “1”;cropbottom “0”;filename ‘graphs/Fig3-6.gif’;file-properties “XNPEU”;}} \end{center}
Figure~\ref{pdf_std_normal} illustrates the PDF for a standard normal variable. When working with normal variables, this transformation is often desirable since the CDF for standard normal variables has been computed and appears in tables at the end of virtually all statistics and econometrics texts (including this text). This CDF is used to perform hypothesis tests on normal variates. Readers who have recently completed an introductory statistics course will probably recall computing many $Z$ – values using the formula in equation \ref{Z-score_1.sc}.
In later chapters, it will be shown that, under certain assumptions, the distribution functions for the estimated intercept and slope parameters in a regression model follow a normal distribution. The distribution of many other estimators and test statistics in regression models also approach a normal distribution as the size of the sample increases. Thus, the normal distribution is extensively used in econometric analysis.
\subsubsection{Application: Using the standard normal table} Let’s examine how the standard normal table may be used by econometricians.
Suppose that the lifetime of a particular model of computer hard drive can be represented by a random variable, $X$, that measures the number of days before a hard-drive failure occurs. In this example, assume that $X$ is distributed normally with a mean of 2800 days and variance of 1600 days. The standard deviation of $X$, $\sigma _{X}$, is the square root of the variance, thus
\begin{equation*}
\sigma _{X}=\sqrt{1600}
\end{equation*}
\begin{equation*}
=40
\end{equation*}
Under these assumptions, a standard normal variate, $Z$ can be created using the transformation:
\begin{equation}
Z=\frac{X-\mu _{X}}{\sigma _{X}} \label{z.trans.q.sc} \end{equation}
\begin{equation*}
=\frac{X-2800}{40}
\end{equation*}
This standardized normal variable, $Z$, has a mean equal to zero and a variance (and standard deviation) equal to one.
Suppose the econometrician wishes to determine the probability that a hard drive will last for 2860 or more days. Using equation \ref{z.trans.q.sc}, the value of $Z$ corresponding to $X=2860$ is:
\begin{equation*}
Z=\frac{2860-2800}{40}
\end{equation*}
\begin{equation*}
=1.5
\end{equation*}
Table \ref{n-table} in Appendix \ref{stat.tab.app} (at the end of this text) contains a listing of the CDF for the standard normal density function.
\footnote{%
This table actually contains only the upper half of the CDF. Since the standard normal density function is symmetric about the origin, the probability of observing a value that is less than any negative value of $Z$ can be found by using the relationship:
\begin{equation*}
\Phi (-Z)=1-\Phi (Z)
\end{equation*}%
} According to this table:
\begin{equation*}
\text{Prob(}Z\leq 1.5)=0.9332
\end{equation*}
The probability of having a hard drive last for more than 2860 days, can be determined as::
\begin{equation*}
\text{Prob(}X>2860)=\text{Prob(}Z>1.5)
\end{equation*}
\begin{equation*}
=1-\text{Prob(}Z\leq 1.5)
\end{equation*}
\begin{equation*}
=1-0.9332
\end{equation*}
\begin{equation*}
=0.0668
\end{equation*}
Thus, in the population, there is a 6.68\% probability of observing a hard drive that will last for 2860 or more days.
\subsection{$\protect\chi ^2$-distribution}
If a random variable is equal to the sum of the squares of $N$ independent standard normal variables, then the variable is distributed according to a $% \chi ^{2}$-distribution\footnote{%
This distribution is also expressed as a “chi-squared” distribution. In particular, it is often seen in this form in HTML documents appearing on the internet since the Greek letter $\chi $ (pronounced “chi” – with a hard “c”) does not appear in the most commonly used English character sets.} with $N$ degrees of freedom. The notation:
\begin{equation*}
Y_{i}\sim \chi ^{2}(N)
\end{equation*}
indicates that the random variable $Y$ follows a $\chi ^{2}$-distribution with $N$ degrees of freedom.
\begin{center}
\FRAME{ftbpFU}{4.4892in}{3.013in}{0pt}{\Qcb{$\protect\chi ^{2}$ distribution}% }{\Qlb{chi2_graph}}{fig3-7.gif}{\special{language “Scientific Word”;type “GRAPHIC”;maintain-aspect-ratio TRUE;display “USEDEF”;valid_file “F”;width 4.4892in;height 3.013in;depth 0pt;original-width 4.4373in;original-height 2.9689in;cropleft “0”;croptop “1”;cropright “1”;cropbottom “0”;filename ‘graphs/Fig3-7.gif’;file-properties “XNPEU”;}}
\end{center}
Since the $\chi ^{2}$-distribution represents the distribution of a variable consisting of the sum of squares of random variables, it can take on only nonnegative values. Thus, a $\chi ^{2}$-random variable will always be greater than or equal to zero. As Figure~\ref{chi2_graph} indicates, the $% \chi ^{2}$-distribution is skewed to the right. As $N$ increases, however, the skewness of the $\chi ^{2}$-distribution decreases. In fact, the $\chi ^{2}$-distribution tends towards a normal distribution as $N$ approaches infinity.
An interesting property of the $\chi ^{2}$-distribution is that the sum of two independent $\chi ^{2}$-variables will also be distributed as a $\chi ^{2}$-variable. For example, suppose that $X$ and $Y$ are independent $\chi ^{2}$-variates with degrees of freedom of $N_{1}$ and $N_{2}$ respectively.
Then, the sum:
\begin{equation*}
Z=X+Y
\end{equation*}
is distributed as a $\chi ^{2}$-variate with $N_{1}+N_{2}$ degrees of freedom.
The $\chi ^{2}$-distribution is frequently used by econometricians to test hypotheses concerning the variance of error terms in a regression model. In future chapters, you will see numerous applications of the $\chi ^{2}$ distribution for hypothesis tests of this sort. A\ more complete discussion of the use of this distribution for the purpose of testing hypotheses appears in Chapter \ref{biv.hyp.chap}\ and subsequent chapters. For now, it is most important that you understand the relationship that exists between the $\chi ^{2}$ distribution and the normal distribution.
\subsection{\textit{t-}distribution}
The standard normal CDF\ discussed above can be used for hypothesis tests only when the variance of the distribution is known (since the standard deviation of the distribution appears in the denominator of the $Z$% -transformation). In practice, this variance is not generally known and must be estimated. In this case, a $t$-ratio is formed in the following manner: \begin{equation*}
t=\frac{X-\mu _{X}}{\hat{\sigma}_{X}}
\end{equation*}%
The PDF for this random variable is called a
%TCIMACRO{\TeXButton{t-}{\boldmath$t$\unboldmath}}% %BeginExpansion
\boldmath$t$\unboldmath%
%EndExpansion
\textbf{-distribution}.\footnote{%
More technically, suppose that $Z$ is a standard normal variate and $X$ is distributed as a $\chi ^{2}$-random variable with $N$ degrees of freedom. If $Z$ and $X$ are independent, then the ratio:
\begin{equation*}
t=\frac{Z}{\sqrt{X/N}}
\end{equation*}%
is distributed as a $t$-distribution with $N$ degrees of freedom. A proof showing that $t$ =$\left( X-\mu _{X}\right) /\hat{\sigma}_{X}$ is distributed as a $t$-distribution may be found in a more advanced text. See, for example, the discussion in Ramanathan (1998), pp. 105-106.} As Figure~% \ref{t_graph} illustrates, a $t$-distribution looks quite similar to a normal distribution, but has somewhat thicker tails when the sample size is relatively small. As the sample size increases, the $t$-distribution converges to a standard normal distribution.
\begin{center}
\FRAME{ftbpFU}{4.1122in}{3.013in}{0pt}{\Qcb{$t$-distribution}}{\Qlb{t_graph}% }{fig3-8.gif}{\special{language “Scientific Word”;type “GRAPHIC”;maintain-aspect-ratio TRUE;display “USEDEF”;valid_file “F”;width 4.1122in;height 3.013in;depth 0pt;original-width 4.0629in;original-height 2.9689in;cropleft “0”;croptop “1”;cropright “1”;cropbottom “0”;filename ‘graphs/Fig3-8.gif’;file-properties “XNPEU”;}}
\end{center}
The $t$-distribution is extensively used in econometric analysis. It is used to test hypotheses involving estimated parameters (such as regression coefficients) that have an unknown variance. There is a different $t$% -distribution for each possible level of the degrees of freedom. As noted above, the degrees of freedom associated with an estimator is, in general, equal to the number of observations minus the number of parameters that must be estimated to construct the parameter estimate. Since the $t$-distribution converges to the normal distribution as the degrees of freedom tend towards infinity, the standard normal CDF may be used for hypothesis tests involving $t$-ratios when the degrees of freedom is relatively large.% \exbox{Student’s \boldmath{$t$}-distribution and Beer}{What does beer have to do with the $t$-distribution? One of the things that your instructor may have neglected to tell you in your introductory statistics course is that Student’s $t$-distribution was developed in 1908 by William S. Gosset, a chemist employed by the Guinness brewing company. Since his contract with the Guinness brewing company prohibited Gosset from publishing under his own name, his work on this distribution was published under the pseudonym “Student.”}
The use of the $t$-test for hypothesis testing is discussed more extensively in Chapter \ref{biv.hyp.chap}, \ref{hyp.mult.chap}, and subsequent chapters.
\subsection{$F$-distribution}
If a random variable equals the ratio of two independent $\chi ^2$-random variables divided by their respective degrees of freedom, the random variable is distributed as an $F$-distribution. The $F$-distribution is used to perform tests on the joint significance of two or more variables in a regression equation. It is also used to test linear restrictions on the parameters of an equation (\textit{e.g.}, that the sum of two coefficients equals 1). For example, if $X$ and $Y$ are independent $\chi ^2$-random variables with degrees of freedom equal to $m$ and $n$ respectively, then the ratio:
\begin{equation}
F=\frac{\frac Xm}{\frac Yn} \label{F-stat.stat.chap} \end{equation}
follows the $F$-distribution with degrees of freedom equal to $m$ and $n$.
The $F$-distribution corresponding to $m$ and $n$ degrees of freedom is often expressed using the notation $F(m,n)$.
As in the case of the $\chi ^{2}$-distribution, a variable distributed according to the $F$-distribution will take on only nonnegative values (since both the numerator and denominator are nonnegative). Figure~\ref% {f_graph} provides a graph illustrating the shape of an $F$-distribution with relatively small values of $m$ and $n$. As this diagram indicates, the $% F$-distribution is skewed to the right for finite values of $m$ and $n$.
\begin{center}
\FRAME{ftbpFU}{4.6146in}{3.0338in}{0pt}{\Qcb{$F-$distribution}}{\Qlb{f_graph}% }{fig3-9.gif}{\special{language “Scientific Word”;type “GRAPHIC”;maintain-aspect-ratio TRUE;display “USEDEF”;valid_file “F”;width 4.6146in;height 3.0338in;depth 0pt;original-width 4.5627in;original-height 2.9897in;cropleft “0”;croptop “1”;cropright “1”;cropbottom “0”;filename ‘graphs/Fig3-9.gif’;file-properties “XNPEU”;}}
\end{center}
An interesting relationship exists between the $t$- and $F$-distributions.
If a random variable follows a $t$-distribution with $m$ degrees of freedom, the square of the variable will follow an $F$-distribution with $1$ and $m$ degrees of freedom in the numerator and the denominator.
When $F$-statistics are computed, it is standard practice to define the numerator in the $F$-statistic (equation \ref{F-stat.stat.chap}) as the term that is larger. In other words, $F$-statistics are constructed so that $X/m$ is larger than $Y/n$. In consequence, the $F$-statistic is always greater than $1$.
As the sample size increases, the denominator of the $F$-statistic converges to $1$. Therefore, the $F$-distribution converges to a $\chi ^2$% -distribution with $m$ degrees of freedom as $n$ (the degrees of freedom for the denominator) approaches infinity.\footnote{%
A proof of this result may be found in a more advanced text. See, for example, the discussion in Goldberger (1991), pp. 199-200.} Thus, when the degrees of freedom for the denominator is relatively large, the approximation:
\begin{equation*}
\chi ^2=mF
\end{equation*}
may be used to approximate the $F$-distribution.\footnote{% It should also be noted that the $F$-distribution converges to a normal distribution when both $m$ and $n$ approach infinity.} (This result is useful in constructing hypothesis tests when working with a large sample size.)
The use of the $F$-statistic for hypothesis tests is discussed in Chapters % \ref{biv.hyp.chap}, \ref{hyp.mult.chap} and subsequent chapters.
\section{ Summary}
A substantial portion of this chapter was devoted to a discussion of estimators. Desirable properties for estimators include: \begin{itemize}
\item linearity,
\item unbiasedness,
\item consistency, and
\item efficiency.
\end{itemize}
Linear estimators may be constructed as a linear function of observed variables. Unbiased estimators are estimators that “on average” provide correct estimates of population parameters. A consistent estimator converges to the population parameter as the size of the sample approaches infinity.
An efficient estimator attains the lowest variance that can be achieved by an unbiased estimator.
Sample estimators of the population mean, variance, covariance, and correlation coefficients were also presented in this chapter. The chapter concludes with a brief discussion of the distribution functions that are used most extensively in later chapters of this text.
You must have a solid understanding of this material if you are to understand how to estimate the parameters in econometric models and perform hypothesis tests. Be sure that you understand all of the concepts in this chapter before moving on to the next chapter. If you do not have a solid understanding of these concepts, you will probably experience difficulty understanding later portions of the text.
\section{Key Concepts}
estimators
linearity
unbiasedness
consistency
efficiency
mean square error
mean square error efficiency
sample mean
sample variance
sample standard deviation
degrees of freedom
descriptive statistics
central limit theorems
normal density function
standard normal density function
$t$-distribution
$\chi ^2$-distribution
$F$-distribution
\newpage\
\section{Exercises and Problems}
\begin{enumerate}
\item Explain, in intuitive terms, what is meant by each of the following properties of estimators:
\begin{enumerate}
\item linearity.
\item unbiasedness.
\item consistency.
\item efficiency.
\end{enumerate}
\item Can an estimator be biased, but consistent? If so, illustrate the PDF of the estimator for two different sample sizes. If not, explain why not.
\item Suppose that a somewhat lazy statistician is asked to estimate the population mean of a variable using survey data (derived from a random sample). The statistician forms an estimate by selecting the first survey response and reporting that value as the estimate of the mean. Is this estimator unbiased? Is this estimator consistent? Explain your answers.
\item An econometrician attempts to analyze the relationship between income and years of schooling. The data used for this study appears in Table \ref% {quest} below.
\begin{enumerate}
\item Determine the sample means for income and years of schooling. What does the sample mean for these variables measure?
\item Determine the sample variance for each of these variables. What does the variance for these variables measure?
\item Determine the sample covariance and correlation coefficient. Explain what is meant by each of these measures.
\begin{table}[h]
\begin{center}
\begin{tabular}{|ccc|}
\hline
\textbf{Observation} & \textbf{income (in thousands)} & \textbf{Years of schooling} \\ \hline
1 & \$20 & 12 \\
2 & 30 & 12 \\
3 & 25 & 14 \\
4 & 15 & 10 \\
5 & 30 & 12 \\
6 & 60 & 16 \\
7 & 40 & 12 \\
8 & 30 & 12 \\
9 & 40 & 14 \\
10 & 50 & 16 \\ \hline
\end{tabular}%
\end{center}
\caption{Data on income and educational attainment } \label{quest}
\end{table}
\pagebreak
\end{enumerate}
\item An analyst collects the data on $X$ and $Y$ that appears in Table \ref% {xry}.
\begin{enumerate}
\item Determine the sample means for $X$ and $Y$.
\item Determine the sample variances and standard deviations for $X$ and $Y$.
\item Determine the sample covariance between $X$ and $Y$.
\item Determine the sample correlation coefficient for these two variables.
\begin{table}[h]
\begin{center}
\begin{tabular}{|cc|}
\hline
\textbf{$X$} & \textbf{$Y$} \\ \hline
10 & 20 \\
20 & 30 \\
30 & 15 \\
40 & 20 \\
50 & 25 \\
60 & 30 \\ \hline
\end{tabular}%
\end{center}
\caption{Observations on $X$ and $Y$ }
\label{xry}
\end{table}
\end{enumerate}
\item A random variable, $X$, is distributed normally with a mean of $\mu _{X}$ and a variance of $\sigma ^{2}$. Use the properties of expectations to show that $Z=\frac{X-\mu _{X}}{\sigma _{X}}\sim N(0,1)$.
\item Suppose that the lifespan of a particular type of lightbulb is normally distributed with a mean of 1000 hours and a variance of 8100 hours.
Determine the probability that a randomly selected lightbulb will: \begin{enumerate}
\item last for 800 hours or less?
\item last for more than 800 hours?
\item more than 1100 hours?
\item between 900 and 1100 hours?
\item between 800 and 1200 hours?
\end{enumerate}
\item A meteorologist discovers that the amount of rainfall in a given location during the month of July is a random variable with a mean of 3.0 and a variance of 0.25. Determine the probability that July’s rainfall will be:
\begin{enumerate}
\item less than 2.0 inches.
\item greater than 4.0 inches.
\item between 2.0 and 4.0 inches.
\end{enumerate}
\item Suppose that a random variable $Z$ is distributed as a standard normal variate. Determine the values of $a$ and $b$ so that: \begin{equation*}
\text{Prob}(Z\leq a)=.025
\end{equation*}
and
\begin{equation*}
\text{Prob}(Z>b)=.025
\end{equation*}
\item A normally distributed random variable $X$ has a mean of 80 and a variance of 64. Determine the values of $x_{o}$ and $x_{1}$ so that: \begin{equation*}
\text{Prob(}X\leq x_{o}\text{) = 2.5\%}
\end{equation*}
and
\begin{equation*}
\text{Prob(}X>x_{1}\text{) = 2.5\%}
\end{equation*}
\item Suppose that a random variable $Z$ is distributed as a standard normal variate. Determine the values of $a$ and $b$ so that: \begin{equation*}
\text{Prob}(Z\leq a)=0.005
\end{equation*}
and
\begin{equation*}
\text{Prob}(Z>b)=0.005
\end{equation*}
\item A normally distributed random variable $X$ has a mean of 80 and a variance of 64. Determine the values of $x_{o}$ and $x_{1}$ so that: \begin{equation*}
\text{Prob(}X\leq x_{o}\text{) = 0.005} \end{equation*}
and
\begin{equation*}
\text{Prob(}X>x_{1}\text{) = 0.005} \end{equation*}
\item A random variable $Z$ is distributed as a standard normal variate.
Determine the value of $c$ so that: \begin{equation*}
\text{Prob}(-c<Z<c)=95\% \end{equation*}
\item A random variable $Z$ is distributed as a standard normal variate.
Determine the value of $c$ so that: \begin{equation*}
\text{Prob}(-c<Z<c)=99\% \end{equation*}
\end{enumerate}
\newpage\
\section{Mathematical Appendix} \subsection{Unbiasedness and consistency of $\overline{X}$:} \subsubsection{Unbiasedness} \textbf{Proof:}
The sample mean is defined as: \begin{equation*}
\overline{X}=\frac 1N\sum_{i=1}^NX_i \end{equation*}
Thus,
\begin{equation*}
E(\overline{X})=E\left( \frac 1N\sum_{i=1}^NX_i\right) \end{equation*}
By Property 1 of expectations (discussed in Chapter \ref{stat.chap}), this becomes:
\begin{equation*}
=\frac 1N\sum_{i=1}^NE(X_i) \end{equation*}
Since $\mu _X$ is defined to equal $E(X_i)$: \begin{equation*}
=\frac 1N\sum_{i=1}^N\mu _X \end{equation*}
By Property 4 of summations, $\sum_{i=1}^N\mu _X=N\mu _X$: Thus, \begin{equation*}
E(\overline{X})=\frac 1N(N\mu _X) \end{equation*}
\begin{equation*}
=\mu _X
\end{equation*}
\subsubsection{Consistency} \textbf{Proof}:
The variance of the sample mean equals: \begin{equation*}
var(\overline{X})=\frac{\sigma _X^2}N \end{equation*}
An examination of this expression indicates that the variance of $\overline{X% }$ tends towards zero as $N$ approaches infinity. By definition, however, the variance of $\overline{X}$ equals: \begin{equation*}
var(\overline{X})=E(\overline{X}-E(\overline{X}))^2 \end{equation*}
Since $E(\overline{X})=\mu _X$, the variance of $\overline{X}$ can be stated as:
\begin{equation*}
var(\overline{X})=E(\overline{X}-\mu _X)^2 \end{equation*}
But, since $var(\overline{X})$ tends towards zero as $N$ tends toward infinity, $\overline{X}$ must converge to $\mu _X$.
\subsection{Unbiasedness of $\hat \protect\sigma ^2$} \textbf{Proof:}
\begin{equation*}
E(\hat \sigma ^2)=E\left[ \frac 1{N-1}\sum_{i=1}^N(X_i-\overline{X})^2\right] \end{equation*}
\begin{equation*}
=\frac 1{N-1}E\sum_{i=1}^N(X_i-\overline{X})^2 \end{equation*}
To complete the proof, we can rewrite the summation in a slightly different form by both adding and subtracting the term $\mu _X$ to each term in the summation (leaving the value of the summation unchanged): \begin{equation*}
\sum_{i=1}^N(X_i-\overline{X})^2=\sum_{i=1}^N\left[ (X_i-\mu _X)-(\overline{X% }-\mu _X)\right] ^2 \end{equation*}
Squaring the terms enclosed in square brackets results in: \begin{equation*}
=\sum_{i=1}^N\left[ (X_i-\mu _X)^2-2(\overline{X}-\mu _X)(X_i-\mu _X)+( \overline{X}-\mu _X)^2\right] \end{equation*}
Since $\overline{X}$ and $\mu _X$ are constants (and applying Properties 3 and 4 of summations): \begin{equation*}
=\left( \sum_{i=1}^N(X_i-\mu _X)^2\right) -2(\overline{X}-\mu _X)\sum_{i=1}^n(X_i-\mu _X)+N(\overline{X}-\mu _X)^2 \end{equation*}
Since $\sum_{i=1}^NX_i=N\overline{X},$% \begin{equation*}
=\left( \sum_{i=1}^N(X_i-\mu _X)^2\right) -2(\overline{X}-\mu _X)\left[ N( \overline{X}-\mu _X)\right] +N(\overline{X}-\mu _X)^2 \end{equation*}
\begin{equation*}
=\left( \sum_{i=1}^N(X_i-\mu _X)^2\right) -2N(\overline{X}-\mu _X)^2+N( \overline{X}-\mu _X)^2 \end{equation*}
\begin{equation*}
=\left( \sum_{i=1}^N(X_i-\mu _X)^2\right) -N(\overline{X}-\mu _X)^2 \end{equation*}
Thus, the expected value of the variance can be expressed as: \begin{equation*}
E(\hat \sigma ^2)=\frac 1{N-1}E\left[ \left( \sum_{i=1}^N(X_i-\mu _X)^2\right) -N(\overline{X}-\mu _X)^2\right] \end{equation*}
\begin{equation*}
=\frac 1{N-1}\left[ \sum_{i=1}^NE(X_i-\mu _X)^2-NE(\overline{X}-\mu _X)^2% \right]
\end{equation*}
By using the definitions for the variance of $X$ ($=E(X_i-\mu _X)^2$) and the variance of $\overline{X}$ (= $E(\overline{X}-\mu _X)^2=\frac{\sigma _X^2% }N$), this can be restated as: \begin{equation*}
=\frac 1{N-1}\left[ N\sigma _X^2-N(\frac{\sigma _X^2}N)\right] \end{equation*}
\begin{equation*}
=\frac{(N-1)\sigma _X^2}{N-1} \end{equation*}
\begin{equation*}
=\sigma _X^2
\end{equation*}