Chapter 2 – Statistical Foundations

johnkane

Chapter 2 – Statistical Foundations

\chapter{Statistical Foundations\label{stat.chap}} Econometric analysis involves the application of statistical tools to analyze the relationships that exist among economic variables. Thus, the study of econometrics requires a knowledge of basic statistical concepts. It is anticipated that most readers of this text are familiar with the basic principles of statistics. This chapter provides a review of several fundamental statistical concepts that are extensively used in later portions of this text.
\section{Random Variables}
A \textbf{random variable} is a variable that can take on different values with a given probability associated with each outcome. You are probably most familiar with the application of this concept in games of chance. When you flip a coin, the outcome is a random variable that can take on two possible values (heads or tails); the probability of each outcome is 0.5 (assuming that the coin is a “fair” coin). If a card is drawn from a full deck of cards, the probability of picking an ace is 1/13. The probability of selecting the ace of spades is 1/52. As these examples suggest, the probability of an event is a measure of the long-run relative frequency of the outcome in repeated experiments. In a large number of trials, approximately 50\% of all coin tosses will result in an outcome of “heads.” If, in repeated trials, you select a single card from a full deck (randomly shuffled each time), you will select an ace in approximately one out of every thirteen trials.
Most econometricians, however, are not very interested in the outcome of coin tosses or drawings from a deck of cards (except, perhaps, when they are in Las Vegas or Atlantic City). Econometricians are more likely to focus their attention on the distribution of such random variables as: household income, educational attainment, firm profitability, prices, and other economic variables. The analysis of simple probability distributions (such as those resulting from the tossing of a fair coin or a fair die), however, serves to illustrate some important statistical concepts.
A random variable may be either discrete\textbf{\ }or continuous. A \textbf{% discrete random variable} takes on either a finite number of values or a countably infinite number of values.\footnote{% A countably infinite number of values occurs if a one-to-one mapping can be constructed between the set of alternatives and the set of positive integers.% } In the U.S. population, the number of an individual’s siblings is a discrete random variable that can take on only integer values (0, 1, 2, 3, 4, 5, …). A \textbf{continuous random variable} may take on any value within an interval. In a sample of college students, an individual’s weight is a continuous random variable that can take on an infinite number of possible outcomes (\textit{e.g.}, 105.239 lbs, 148.0 lbs, or 181.7691 lbs).
\section{Probability Density Functions}
\subsection{Discrete random variables}
For every random process, a \textbf{probability density function (PDF)}% \footnote{%
The PDF is also referred to as a probability distribution function in many texts.} exists that provides the probability of alternative outcomes. In the case of a coin toss, let’s define a variable $X$ that equals 1 if the toss results in heads, and 0 if tails occurs. The PDF for a single coin toss is: \begin{equation*}
\text{Prob}(X=0)=\ \frac{1}{2}
\end{equation*}
\begin{equation*}
\text{Prob}(X=1)=\text{ }\frac{1}{2}
\end{equation*}
In general, if there are $N$ mutually exclusive and collectively exhaustive outcomes ($x_{1},x_{2},…,x_{N}$),\footnote{% A set of outcomes is mutually exclusive if only one outcome may occur. In other words, if $x_{i}$ occurs, $x_{j}$ cannot occur (for $i\neq j$). A set of outcomes is collectively exhaustive if the set contains all of the possible outcomes that may result from a random experiment. When a set is made up of mutually exclusive and collectively exhaustive outcomes, the outcome of the experiment will be a single element of this set.
\par
It should be noted that the set of alternative outcomes for a discrete random variable may contain an infinite number of potential outcomes.
\par
(For notational convenience, upper case letters, such as $X$, are used throughout this chapter to denote the random variable while lower case letters, such as $x_{i}$, are used to represent alternative outcomes for the random variable.)} then the PDF provides the probabilities of each of these $% N$ outcomes. This PDF can be represented by a table such as Table \ref% {pdf.tab.stat.chap}. In mathematical terms, the PDF for a discrete random variable is a function, $f(x_{i})$, where $f(x_{i})$ is the probability of observing outcome $i$. In the case of a single roll of a die, $f(x_{i})=% \frac{1}{6}$ ($i=1,\ldots ,6$). Table \ref{pdf1} contains a listing of this PDF.
\begin{table}[tbp]
\begin{center}
\begin{tabular}{cc}
\hline
Outcome ($x_{i}$) & Probability ($f(x_{i})$) \\ \hline $x_{1}$ & Prob$(X=x_{1})$ \\
$x_{2}$ & Prob$(X=x_{2})$ \\
$x_{3}$ & Prob$(X=x_{3})$ \\
$\vdots $ & $\vdots $ \\
$x_{N}$ & Prob$(X=x_{N})$ \\ \hline
\end{tabular}%
\end{center}
\caption{Tabular representation of a probability density function} \label{pdf.tab.stat.chap}
\end{table}
\begin{table}[tbp]
\begin{center}
\begin{tabular}{|cc|}
\hline
Outcome ($x_{i}$) & Probability ($f(x_{i})$) \\ \hline 1 & $\frac{1}{6}$ \\
2 & $\frac{1}{6}$ \\
3 & $\frac{1}{6}$ \\
4 & $\frac{1}{6}$ \\
5 & $\frac{1}{6}$ \\
6 & $\frac{1}{6}$ \\ \hline
\end{tabular}%
\end{center}
\caption{Probability density function for a single toss of a fair die.} \label{pdf1}
\end{table}
A PDF possesses two properties:\footnote{%
The summation operator, $\sum $, is used to simplify notation throughout the remainder of the text. The expression:
\begin{equation*}
\sum_{i=1}^{N}x_{i}
\end{equation*}
can be expressed in words as the sum from $1$ to $N$ of $x_{i}$. In mathematical terms, this equals:
\begin{equation*}
\sum_{i=1}^{N}x_{i}=x_{1}+x_{2}+x_{3}+\cdots +x_{n} \end{equation*}
The mathematical appendix at the end of this chapter contains a list of several important properties of the summation operator. Readers who are not comfortable with this notation are urged to read this portion of the mathematical appendix before continuing.}
\begin{equation*}
\begin{array}{ll}
\text{Property I:} & 0\leq f(x_{i})\leq 1 \\ \text{Property II:} & \sum\limits_{i=1}^{N}f(x_{i})=1% \end{array}%
\end{equation*}
The first property states that the probability of an event occurring can never be less than zero or greater than one. Since the probability of an event is a measure of its relative frequency in the population, negative probabilities would be nonsensical. The probability of an event must be less than or equal to 1 since an event can occur at most 100\% of the time. (Of course, if the probability equals one, there would no longer be any randomness associated with the outcome). The second property states that the sum of the probabilities for all possible outcomes equals one. If A and B are mutually exclusive outcomes, the probability of either A or B occurring equals $f(A)+f(B)$. Thus, $\sum\limits_{i=1}^{N}f(x_{i})$ is simply the probability that any one of these outcomes occurs. Since the events ($x_{i}$% ) are mutually exclusive and represent all possible outcomes from the underlying experiment, this summation must equal 1.
Figure~\ref{die_pdf} contains a graph of the PDF associated with a single toss of a fair die. Note that:
\begin{equation*}
\sum_{i=1}^{6}f(x_{i})=\ f(1)+f(2)+f(3)+f(4)+f(5)+f(6) \end{equation*}%
\begin{equation*}
=\frac{1}{6}+\frac{1}{6}+\frac{1}{6}+\frac{1}{6}+\frac{1}{6}+\frac{1}{6} \end{equation*}%
\begin{equation*}
=1
\end{equation*}
\begin{center}
\FRAME{ftbpFU}{4.4936in}{3.1116in}{0pt}{\Qcb{PDF for a single roll of a die}% }{\Qlb{die_pdf}}{fig2-1.gif}{\special{language “Scientific Word”;type “GRAPHIC”;maintain-aspect-ratio TRUE;display “USEDEF”;valid_file “F”;width 4.4936in;height 3.1116in;depth 0pt;original-width 6.6668in;original-height 4.6043in;cropleft “0”;croptop “1”;cropright “1”;cropbottom “0”;filename ‘graphs/Fig2-1.gif’;file-properties “XNPEU”;}} \end{center}
\subsection{Continuous random variables}
A PDF can also be defined for continuous random variables. In this case, probabilities are defined for a range of values, e.g., Prob$(a<X\leq b)$. As noted above, in a sample of college students, the distribution of each student’s weight is a continuous random variable. A possible PDF associated with the distribution appears in Figure~\ref{weight_pdf}. This PDF is an example of a \textbf{normal distribution}, one of the most commonly used density functions in econometric analysis. When a random variable is normally distributed, the probability of observing an outcome is highest for values closest to the mean. The probability becomes progressively smaller for values further from the mean. In the distribution of student weights, the average weight is 145 pounds. As this diagram suggests, the probability of observing weights below 100 or above 190 is relatively small.
\begin{center}
\FRAME{ftbpFU}{5.0695in}{2.7648in}{0pt}{\Qcb{Probability density function for student weight}}{\Qlb{weight_pdf}}{fig2-2.gif}{\special{language “Scientific Word”;type “GRAPHIC”;maintain-aspect-ratio TRUE;display “USEDEF”;valid_file “F”;width 5.0695in;height 2.7648in;depth 0pt;original-width 6.9064in;original-height 3.7498in;cropleft “0”;croptop “1”;cropright “1”;cropbottom “0”;filename
‘graphs/Fig2-2.gif’;file-properties “XNPEU”;}} \end{center}
The probability that a continuous random variable lies within a particular interval is simply the area under the PDF in this interval. If an individual is randomly selected from this population, the probability of selecting an individual that weighs between 175 and 190 pounds equals the shaded area under the normal curve in Figure~\ref{weight_pdf}.\footnote{% For students who are comfortable with integral calculus, this relationship can be summarized as $\int_{a}^{b}f(x)dx$, where $f(x)$ is the PDF for the random variable $x$.}
As in the case of a discrete random variable, the probability density function, $f(x)$, for a continuous random variable ($X$) has two basic properties:
\begin{enumerate}
\item The value of the PDF is greater than or equal to zero for all values of $X$ (\textit{i.e.}, $f(x)\geq 0$).
\item The total area under the PDF equals 1.\footnote{% If $f(x)$ is the PDF for a continuous random variable $X$, then these two properties can be stated as:
\begin{equation*}
\text{Property I:}f(x)\geq 0\text{, and}
\end{equation*}
\begin{equation*}
\text{Property II:}\int_{-\infty }^{\infty }f(x)dx=1.
\end{equation*}%
}
\end{enumerate}
These properties are essentially equivalent to the two properties listed above for the case of a discrete random variable.
\section{Cumulative Density Functions}
You will often be interested in determining the probability that an outcome lies within a certain range of values. The \textbf{cumulative density function (CDF)}\footnote{%
The CDF is also called a cumulative distribution function in many texts.} provides a convenient tool for this purpose. For a random variable $X$, the cumulative density function $F(x)$ is defined as:\footnote{% By convention, the probability density function is generally represented by a lower-case letter and the cumulative density function is represented by the corresponding upper-case letter. Thus, if the PDF is $f(x)$, the CDF is represented as $F(x)$.}
\begin{equation*}
F(x)=\text{Prob}(X\leq x).
\end{equation*}%
As in the discussion of discrete random variables, upper case letters (such as $X$) are used here to represent the random variable, while lower case letters (such as $x$) are used to represent specific values that the random variable may take on. Thus, the equation above can be read as stating that $% F(x)$ is equal to the probability that the random variable $X$ will take on a specific value that is less than or equal to the value $x$. In the case of a discrete random variable, the CDF is simply computed by adding up the probabilities for all values of $X$ that are less than $x$. Table \ref{cdf1} contains a listing of the CDF for the case of a single toss of a fair die.
Notice, for example, that the value of $F(2)$ in this table is computed as: \begin{equation*}
F(2)=\text{ Prob}(X\leq 2)
\end{equation*}%
\begin{equation*}
=\text{Prob}(X=1)+\text{Prob}(X=2)
\end{equation*}%
\begin{equation*}
=\frac{1}{3}+\frac{1}{3}=\frac{2}{3}
\end{equation*}
\begin{table}[tbp]
\begin{center}
\begin{tabular}{|cc|}
\hline
\textbf{Outcome (\boldmath$X$)} & \textbf{CDF} \\ \hline 1 & 1/6 \\
2 & 1/3 \\
3 & 1/2 \\
4 & 2/3 \\
5 & 5/6 \\
6 & 1 \\ \hline
\end{tabular}%
\end{center}
\caption{CDF for a single toss of a fair die.} \label{cdf1}
\end{table}
Note that the CDF increases as $x$ increases. As $x$ approaches positive infinity, the value of the CDF will always approach one. In the case of discrete distributions, the value of the CDF will always equal one when $x$ reaches its maximum value (six in the example above).
In the case of a continuous distribution, the CDF is equal to the area under the portion of the probability density function that lies to the left of $x$% . This concept is illustrated in the graph that appears in Figure~\ref% {x_le_x}. A graph of the CDF associated with this distribution appears in Figure~\ref{cdf_graph}. Note that the value of the CDF approaches 1 as the value of $x$ increases.
\begin{center}
\FRAME{ftbpFU}{5.0687in}{2.7657in}{0pt}{\Qcb{Prob $(X\leq x)$}}{\Qlb{x_le_x}% }{fig2-3.gif}{\special{language “Scientific Word”;type “GRAPHIC”;maintain-aspect-ratio TRUE;display “USEDEF”;valid_file “F”;width 5.0687in;height 2.7657in;depth 0pt;original-width 6.9064in;original-height 3.7498in;cropleft “0”;croptop “1”;cropright “1”;cropbottom “0”;filename ‘graphs/Fig2-3.gif’;file-properties “XNPEU”;}}\FRAME{ftbpFU}{5.047in}{% 3.4082in}{0pt}{\Qcb{Cumulative density function}}{\Qlb{cdf_graph}}{fig2-4.gif% }{\special{language “Scientific Word”;type “GRAPHIC”;maintain-aspect-ratio TRUE;display “USEDEF”;valid_file “F”;width 5.047in;height 3.4082in;depth 0pt;original-width 5.1353in;original-height 3.4584in;cropleft “0”;croptop “1”;cropright “1”;cropbottom “0”;filename
‘graphs/Fig2-4.gif’;file-properties “XNPEU”;}} \end{center}
If the CDF is known, then it is very easy to determine the probability that a random variable will fall within any specific range of values. In particular:
\begin{equation}
\text{Prob(}X>x)\text{ }=\ 1\ -\ F(x) \label{cdf.prop.sc} \end{equation}%
and, for $b>a$:
\begin{equation}
\text{Prob}(a<X<b)=F(b)\ -\ F(a) \label{cdf.prop1.sc} \end{equation}%
Since the total area under the probability density curve equals 1, the area to the right of any threshold value ($x$) must equal 1 minus the area that lies to the left of this point. This relationship is illustrated in Figure~% \ref{x_gt_x} As Figure~\ref{a_lt_x_lt_b} indicates, the probability of observing a value in any interval is equal to the area to the left of the larger value of $X$ (\emph{b} in this case) minus the area that lies to the left of the smaller value of $X$ (\emph{a}).
\begin{center}
\FRAME{ftbpFU}{5.047in}{2.7544in}{0pt}{\Qcb{Prob $(X>x)$}}{\Qlb{x_gt_x}}{% fig2-5a.gif}{\special{language “Scientific Word”;type “GRAPHIC”;maintain-aspect-ratio TRUE;display “USEDEF”;valid_file “F”;width 5.047in;height 2.7544in;depth 0pt;original-width 5.1768in;original-height 2.8124in;cropleft “0”;croptop “1”;cropright “1”;cropbottom “0”;filename ‘graphs/Fig2-5a.gif’;file-properties “XNPEU”;}}\FRAME{ftbpFU}{5.0194in}{% 2.4146in}{0pt}{\Qcb{Prob $(a<X<b)$}}{\Qlb{a_lt_x_lt_b}}{fig2-5b.gif}{\special% {language “Scientific Word”;type “GRAPHIC”;maintain-aspect-ratio TRUE;display “USEDEF”;valid_file “F”;width 5.0194in;height 2.4146in;depth 0pt;original-width 5.3437in;original-height 2.5417in;cropleft “0”;croptop “1”;cropright “0.9944”;cropbottom “0”;filename ‘graphs/Fig2-5b.gif’;file-properties “XNPEU”;}} \end{center}
The relationships described in equations \ref{cdf.prop.sc} and \ref% {cdf.prop1.sc} and depicted in Figures \ref{x_gt_x} and \ref{a_lt_x_lt_b} can also be illustrated using a cumulative density function such as the one appearing in Figures \ref{xd_gt_x_redux} and \ref{a_lt_x_lt_b_redux}. To determine the probability of an outcome that is greater than or equal to a specified value of $x$, the CDF can be used to determine the probability of observing an outcome that is less than or equal to $x$. In Figure~\ref% {xd_gt_x_redux}, this is simply equal to the value $F(x)$. The probability of an outcome greater than $x\,$ is simply equal to $1-F(x)$ (as noted in equation \ref{cdf.prop.sc}). An econometrician interested in finding the probability of observing a value of $X$ that lies between $a$ and $b$ can use a CDF such as the one appearing in Figure~\ref{a_lt_x_lt_b_redux} to find $F(a)$ and $F(b)$. As indicated by equation \ref{cdf.prop1.sc}, the probability of an outcome between points $a$ and $b$ equals $F(b)-F(a)$ (assuming, of course, that $b$ is greater than $a$).
\begin{center}
\FRAME{ftbpFU}{5.047in}{3.4169in}{0pt}{\Qcb{Prob $(X>x)$}}{\Qlb{xd_gt_x_redux% }}{fig2-5c.gif}{\special{language “Scientific Word”;type “GRAPHIC”;maintain-aspect-ratio TRUE;display “USEDEF”;valid_file “F”;width 5.047in;height 3.4169in;depth 0pt;original-width 5.1975in;original-height 3.5103in;cropleft “0”;croptop “1”;cropright “1”;cropbottom “0”;filename ‘graphs/Fig2-5c.gif’;file-properties “XNPEU”;}}\FRAME{ftbpFU}{5.047in}{% 3.4203in}{0pt}{\Qcb{Prob $(a<X<b)$}}{\Qlb{a_lt_x_lt_b_redux}}{fig2-5d.gif}{% \special{language “Scientific Word”;type “GRAPHIC”;maintain-aspect-ratio TRUE;display “USEDEF”;valid_file “F”;width 5.047in;height 3.4203in;depth 0pt;original-width 5.2088in;original-height 3.5206in;cropleft “0”;croptop “1”;cropright “1”;cropbottom “0”;filename
‘graphs/Fig2-5d.gif’;file-properties “XNPEU”;}} \end{center}
In practice, statisticians and econometricians use tables (or computer programs) that provide values of the CDF associated with commonly used probability density functions. Appendix A at the end of this text contains a collection of tables that provide a partial listing of the CDF for several of these probability density functions.
Let’s consider a couple of examples that illustrate how an econometrician might use the properties appearing in equations \ref{cdf.prop.sc} and \ref% {cdf.prop1.sc}. Suppose that he or she knows the CDF\ for the distribution of annual household income in a given economy. If the econometrician wishes to determine the probability that a given household has an income that exceeds any given value, this can be easily computed using equation \ref% {cdf.prop.sc} and the CDF appearing in Table \ref{inc.cdf.tab.qa}. For example the probability that a randomly selected household will have an annual income that exceeds \$50,000 is given by the formula: \begin{table}[tbp]
\begin{center}
\begin{tabular}{|cc|}
\hline
\textbf{Income ($X$)} & \textbf{CDF} \\ \hline \$10,000 & .05 \\
\$30,000 & .35 \\
\$50,000 & .65 \\
\$70,000 & .85 \\
\$90,000 & .92 \\ \hline
\end{tabular}%
\end{center}
\caption{CDF for selected values of income} \label{inc.cdf.tab.qa}
\end{table}
\begin{equation*}
\text{Prob(}X>\$50,000)\text{ }=\ 1\ -\ F(\$50,000) \end{equation*}
If the value of the CDF evaluated at $X=\$50,000$ equals 0.65, this means that 65\% of households will have an income that lies at or below \$50,000.
Using \ref{cdf.prop1.sc}, the probability of an income above \$50,000 is given by:
\begin{equation*}
\text{Prob(}X>\$50,000)\text{ }=\ 1\ -\ 0.65 \end{equation*}
\begin{equation*}
=0.35
\end{equation*}
Thus, the probability of an income greater than \$50,000 is 35\%.
Alternatively, suppose this econometrician wishes to compute the probability that a randomly selected household in this economy will have an annual income between \$50,000 and \$70,000. If the CDF equals 0.65 at an income of \$50,000 and 0.85 at an income of \$70,000, then the probability of an income between \$50,000 and \$70,000 can be determined using equation \ref% {cdf.prop1.sc}:
\begin{equation*}
\text{Prob}(\$50,000<X<\$70,000)=F(\$70,000)\ -\ F(\$50,000) \end{equation*}
\begin{equation*}
=0.85-0.65
\end{equation*}
\begin{equation*}
=0.2
\end{equation*}
Thus, 20\% of this population will have an income that lies between \$50,000 and \$70,000.
\section{Sample versus population}
In most situations, however, econometricians do not know the exact PDF and CDF for the random variables in which they are interested. Instead, statistical procedures are used to construct estimates of the PDF and CDF using observed data. Suppose, for example, that an econometrician wishes to construct an estimate of the PDF for the distribution of household income in an economy. While, in principle, it is possible to conduct a survey that asks each household in the economy to report its income, a survey of this sort would generally be prohibitively expensive.\footnote{% The U.S. census is an example of a survey that attempts to include all households in the U.S. population. Even in this case, however, some households are unavoidably excluded. Furthermore, due to the large volume of data, must econometric studies that use census data rely on 5\% or 1\% subsets of the data.} Most econometric studies, instead, are based on a \textbf{sample} that is a subset of the entire \textbf{population }that is being analyzed. In this example, the population consists of all of the households in the economy at a given point in time. The sample is the specific subset of these households that is used for the econometric study.
When econometricians analyze data on consumer spending, investment decisions, individual income, and similar variables, they are virtually always dealing with a sample of observations. Statistics based on this sample are used to infer the characteristics of the population. The PDF and CDF for the population can often be usefully described by a few important parameters, known as \textbf{population parameters}. The population mean and variance (described below) are two of the most important parameters that describe the distribution of a given random variable. In the previous chapter, regression models were introduced in which a linear relationship is assumed to exist between two or more variables in a population. The intercept and slope variables in regression models are also examples of population parameters. In general, the values of population parameters such as these are not known, \textit{a priori}, and must be estimated from sample data. One of the main goals of econometric analysis is to use sample information as efficiently as possible to construct reliable estimates of population parameters.
To estimate population parameters using sample data, it is desirable to work with a sample that is representative of the population as a whole. To achieve this goal, statisticians often attempt to construct a sample that is a randomly chosen subsample of the population. A sample constructed in this manner is called a \textbf{random sample}. As the boxed text example on political polls and nonrandom sampling indicates, the consequences of nonrandom sampling can be fairly severe.%
\exbox{Political Polls and Nonrandom Sampling}{In the 1936 U.S. Presidential election, a poll conducted by the {\it Literary Digest} indicated that the Republican Party would sweep the election. In fact, this poll indicated that Landon would win 32 of the 48 U.S.
states. When Roosevelt won in 46 of these states, this poll was called into question.
It was noted that the {\it Literary Digest’s} prediction was based upon a sample primarily drawn from telephone listings and automotive registrations. In 1936, however, a significant portion of the population had neither phones nor cars. High-income individuals were more likely to vote Republican, while low-income individuals were more likely to vote Democratic. Forecasts based upon a sample of phone or auto owners overstated the portion of the electorate supporting the Republican candidate. (For a good discussion of this case, see Manchester (1974), p. 144.)} \section{Expectations}
\subsection{Population Mean}
The \textbf{expected value} of a random variable$\ X$ is a measure of the average value of $X$ that occurs in the population. This expected value is expressed as $E(X)$, and may be computed (in the case of a discrete random variable) as:$\footnote{%
In the case of a continuous random variable, $E(X)=\int_{-\infty }^{\infty }xf(x)dx$, where $f(x)$ is the PDF for the random variable $X$.}$% \begin{equation*}
E(X)\text{ }=\text{ }\sum_{i=1}^{N}x_{i}f(x_{i}) \end{equation*}
where $f(X)$ is the PDF for this random variable. The expected value of a variable is a weighted average of all possible outcomes in which the weight assigned to each outcome is the probability of observing it. The expected value of a random variable is generally denoted by the symbol $\mathbf{\mu }$% . Thus, we can define $\mu _{X}$ as:
\begin{equation*}
\mu _{X}=E(X).
\end{equation*}
The expected value of a random variable is also known as the \textbf{% population mean}.
\subsubsection{Example I: A 50-50 lottery}
Suppose that an economics club sponsors a lottery in which the prize equals one-half of the total receipts collected from the sale of tickets. 200 lottery tickets are sold at a price of \$1.00 for each. If you purchase a ticket, the probability of winning equals $\frac 1{200}$; the probability of losing is $\frac{199}{200}$. Let’s define a variable, $X$, that equals the payoff that is received by an individual who purchases a single ticket in this lottery. If you win the lottery your payoff equals \$99 (\$100 prize minus \$1 for the ticket); your payoff equals -\$1 (the cost of the ticket) if you lose. The probability density function for this random variable is stated in Table \ref{pdf50}.
\begin{table}[tbp]
\begin{center}
\begin{tabular}{|cc|}
\hline
\textbf{Payoff $(X)$ } & \textbf{Probability} \\ \hline -\$1 & $\frac {199}{200}$ \\
\$99 & $\frac {1}{200}$ \\ \hline
\end{tabular}%
\end{center}
\caption{PDF for 50-50 lottery }
\label{pdf50}
\end{table}
The expected value of this lottery is given by: \begin{equation*}
\text{-\$1}(\frac{199}{200})+\text{ \$99}(\frac 1{200})= \end{equation*}
\begin{equation*}
\frac{-\$199+\$99}{200}=
\end{equation*}
\begin{equation*}
\frac{-\$100}{200}=-\$0.50
\end{equation*}
Thus, the expected value of this lottery is -\$0.50. Of course, you would never lose \$0.50 with any single lottery ticket. Instead, you would either win \$99 or lose \$1. The expected value of this lottery is a measure of how much, on average, you would receive if you were to continue to buy these lottery tickets over a long period of time. In this case, you would expect to lose approximately one-half of the total value of your ticket purchases.
Of course, in any finite sample, your losses may be substantially larger or smaller than this amount.
\subsubsection{Example II: Tossing a single die} Let’s define a random variable, $X$, that equals the number of dots appearing on a single toss of a fair die. Since the probability of each outcome is $\frac 16$, the expected value of $X$ equals: \begin{equation*}
E(X)=1(\frac 16)+2(\frac 16)+3(\frac 16)+4(\frac 16)+5(\frac 16)+6(\frac 16)= \frac{21}6=3.5
\end{equation*}
Once again, the expected value of this random variable cannot be attained in any single toss of the die.
In a small sample, the observed frequency of each outcome might differ substantially from the population probabilities. It is quite possible, for example, that the value “2” might appear 6 times in each of the first six throws of a die (even if a fair die is used). Thus, the average outcome, in a small sample, will differ from 3.5. The \textbf{law of large numbers}, however, states that the observed relative frequency of any outcome converges to the corresponding population probability as the size of the sample approaches infinity. Thus, in this example, the observed probabilities of each possible outcome will converge to $\frac{1}{6}$ as the size of the sample increases. In consequence, the average value of all of the outcomes will converge to 3.5 as the size of the sample tends towards infinity.
\exbox{St. Petersburg Paradox}{Consider a game in which you flip a coin until the result is heads. You will be paid \$2$^n$, where $n$ is the number of the flip in which heads first appears. The probability of heads occurring on the first toss equals $\frac 12$. Since the outcomes on separate flips are independent (this concept will be discussed in more detail below), the probability of heads appearing for the first time on the second flip equals: $$\text{Prob (tails on first flip) }\times \text{ Prob (heads on second flip)=} $$ $$\frac 12\times \frac 12=\frac 14 $$
By analogous reasoning, the probability of the first heads appearing on the $n$th flip equals $\frac 1{2^n}$. Thus, the expected value of this game equals: $$\frac 12(\$2)+\frac 14(\$4)+\frac 18(\$8)+…+\frac 1n(\$n)+…= $$$$\$1+\$1+\$1+…+\$1+…=\infty $$ Thus, the expected value of this game is infinite. This is called the {\bf St. Petersburg paradox} because, even though the expected value of this game is infinite, most individuals would not pay a very large amount to participate in this game.} \subsection{Properties of Expectations\label{expect.sc.marker}} There are a few important properties of expectations with which you should be familiar: \footnote{%
Proofs of these properties appear in the mathematical appendix at the end of this chapter.}
\begin{description}
\item[Property 1:] If $c$ is a constant, $E(c)=c$.
\end{description}
This condition states that the expected value of a constant is the constant itself. For example, if $c$ is always equal to 5, then $E(5)=5.$ \begin{description}
\item[Property 2:] If $X$ is a random variable and $a$ is a constant, $% E(aX)=aE(X)$.
\end{description}
This condition states that the expected value of a constant times a random variable equals the constant times the expected value of the random variable. As an example of this property, suppose that all of the workers in a particular firm are paid a piece rate of \$5 per unit of output produced.
If $X$ is a random variable that represents the amount of output produced per hour by a worker, the workers hourly pay will equal \$5$\cdot X$.
Suppose that the average quantity of output produced per hour, $E(X),$ equals 2.1 units of output. In this case, the average hourly payment to workers will equal:
\begin{equation*}
E(\$5\cdot X)=\$5\cdot E(X)
\end{equation*}
\begin{equation*}
=\$5(2.1)=\$10.50
\end{equation*}
\begin{description}
\item[Property 3:] If $X$ is a random variable and $a$ and $b$ are constants,% \textbf{\ $E(aX+b)=aE(X)+b$.}
\end{description}
Extending the example considered for Property 2, suppose that the workers receive a new contract in which they are paid a flat hourly pay rate of \$1.50 an hour in addition to the piece rate of \$5. As before, it is assumed that the average hourly output equals 2.1 units of the good. In this case, their average hourly pay becomes:
\begin{equation*}
E[\$5(X)+\$1.50]=\$5[E(X)]+\$1.50
\end{equation*}
\begin{equation*}
=\$5(2.1)+\$1.50
\end{equation*}
\begin{equation*}
=\$12.00
\end{equation*}
\begin{description}
\item[Property 4:] If $X$\ and $Y$ are random variables, $E(X+Y)=E(X)+E(Y)$% \textbf{$.$}
\end{description}
This condition states that the expected value of the sum of two random variables equals the sum of the expected values of these variables. (A more general form of this property states that the expected value of the sum of $% N $ random variables equals the sum of the expected values of these variables.). To see how this property works, suppose that $X$ represents an individual’s annual salary at a firm while $Y$ represents the annual bonus.
The total compensation received by an employee selected at random in the firm is equal to $X+Y$. If the average annual salary, $E(X)$, is \$50,000 and the average annual bonus, $E(Y)$, is \$12,000, then the average annual compensation will equal:
\begin{equation*}
E(X+Y)=E(X)+E(Y)
\end{equation*}%
\begin{equation*}
=\$50,000+\$12,000
\end{equation*}%
\begin{equation*}
=\$62,000
\end{equation*}
\begin{description}
\item[Property 5:] If $X$ and $Y$ are random variables and $a,b,$ and $c$ are constants, $E(aX+bY+c)=aE(X)+bE(Y)+c.$
\end{description}
This property represents a combination of properties 3 and 4 above. As an example of this property, note that:
\begin{equation*}
E(5X+20Y+100)=5E(X)+20E(Y)+100
\end{equation*}
\begin{description}
\item[Property 6:] If $X$ and $Y$ are two independent random variables, $% E(XY)=E(X)E(Y).$
\end{description}
Two random variables are said to be \textbf{statistically independent} if the distribution of each variable is unaffected by the level of the other variable. This means, for example, that the probability of observing any level of $x_{i}$ is unaffected by the level of $y_{j}$. This property states that the expected value of the product of two independent random variables is equal to the product of the expected value of these variables. (Note that this property will not generally hold if $X$ and $Y$ are not independent.) Estimates of population parameters are constructed by using functions of observed random variables. (This process is discussed in more detail in section \ref{estimators.sec} below.) The properties of expectations appearing above will be used in later chapters to assist in determining the expected value of these functions of random variables.
\subsection{Caution: $E[g(X)]\neq g[E(X)]\label{g(X).marker}$} In the previous section it was observed that the expected value of a sum of random variables is equal to weighted sum of the expected values. Does a similar result hold for other, nonlinear, functions of random variables? Is, for example, $E(X^{2})=E(X)^{2}$? In general, the answer to this question is no. To determine the expected value of a function of a discrete random variable, $g(X)$, it is necessary to compute: \begin{equation*}
E\left[ g(X)\right] =\sum_{i=1}^{N}g(x_{i})\cdot f(x_{i}) \end{equation*}
\begin{equation*}
\text{where: }f(x_{i})\text{ = PDF for }X
\end{equation*}
This result will, in general, not equal $g[E(X)]$.
As an example of this principle, let’s define a random variable $X$ as the outcome from a single toss of a fair coin. As in the earlier example, suppose that $X=1$ when an outcome of “heads” occurs and $X=$ $0$ is when “tails” occurs. The expected value of $X\,$ equals: \begin{equation*}
E(X)=0.5
\end{equation*}
Consider the function:
\begin{equation*}
g(X)=X^2
\end{equation*}
The expected value of $g(X)$ equals:
\begin{equation*}
E\left[ g(X)\right] =E\left( X^2\right)
\end{equation*}
By the definition of the expected value:
\begin{equation*}
=0.5\left( 0\right) ^2+0.5\left( 1\right) ^2=0.5 \end{equation*}
But,
\begin{equation*}
g\left[ E(X)\right] =\left[ E\left( X\right) \right] ^2 \end{equation*}
\begin{equation*}
=\left( 0.5\right) ^2
\end{equation*}
\begin{equation*}
=0.25
\end{equation*}
Thus, $E[g(X)]\neq g[E(X)]$ in this case.
\subsection{Population variance and standard deviation} The population mean, $\mu _{X}$, provides a measure of the average value of a random variable. We are often interested, however, in the amount of variation in a random variable. The \textbf{population variance, $\sigma ^{2} $}, serves as a measure of the dispersion of a random variable around the mean. The population variance for a random variable ($X$) is defined as: \begin{equation*}
\sigma _{X}^{2}=E(X-\mu _{X})^{2}
\end{equation*}
In the case of a discrete random variable, the variance is computed as:% \footnote{%
In the case of a continuous random variable, $\sigma _{X}^{2}=\int_{-\infty }^{\infty }(X-\mu _{X})^{2}f(X)dX$, where $f(X)$ is the PDF for the random variable $X$.}
\begin{equation*}
\sigma _{X}^{2}=\sum_{i=1}^{N}\left( x_{i}-\mu _{X}\right) ^{2}f(x_{i}) \end{equation*}
Since the variance is a measure of the expected value of the \textbf{squared }deviations of a random variable from its mean, the variance is always nonnegative. Figures \ref{norm_pdf_graph} and \ref{norm_pdf1_graph} illustrate the PDFs for two different distributions that share a mean of zero, but have a difference variance. When a distribution has a smaller variance, a larger proportion of the distribution falls within a given interval around the mean. In Figures \ref{norm_pdf_graph} and \ref% {norm_pdf1_graph}, you can observe that there is a larger probability that the outcome will fall between -10 and 10 when the variance of the random variable is smaller.
\begin{center}
\FRAME{ftbpFU}{5.047in}{2.1681in}{0pt}{\Qcb{Normal PDF (mean =0, standard deviation = 5)}}{\Qlb{norm_pdf_graph}}{fig2-6a.gif}{\special{language “Scientific Word”;type “GRAPHIC”;maintain-aspect-ratio TRUE;display “USEDEF”;valid_file “F”;width 5.047in;height 2.1681in;depth 0pt;original-width 5.8851in;original-height 2.5105in;cropleft “0”;croptop “1”;cropright “1”;cropbottom “0”;filename
‘graphs/Fig2-6a.gif’;file-properties “XNPEU”;}}\FRAME{ftbpFU}{5.047in}{% 2.1248in}{0pt}{\Qcb{Normal PDF (mean = 0, standard deviation = 15)}}{\Qlb{% norm_pdf1_graph}}{fig2-6b.gif}{\special{language “Scientific Word”;type “GRAPHIC”;maintain-aspect-ratio TRUE;display “USEDEF”;valid_file “F”;width 5.047in;height 2.1248in;depth 0pt;original-width 5.8851in;original-height 2.4587in;cropleft “0”;croptop “1”;cropright “1”;cropbottom “0”;filename ‘graphs/Fig2-6b.gif’;file-properties “XNPEU”;}} \end{center}
Table \ref{var1} illustrates how the variance can be computed in the case of a discrete probability density function. In the case of the PDF for a single toss of a die, the variance equals 2.9167.
\begin{table}[tbp]
\begin{center}
\begin{tabular}{|cccccc|}
\hline
\boldmath $X $ & \boldmath $\mu _X $ & \boldmath $X – \mu _X $ & \boldmath $% (X- \mu _X)^2 $ & \boldmath $f(X) $ & \boldmath $(X-\mu _X)^2 f(X) $ \\ \hline
1 & 3.5 & -2.5 & 6.25 & $\frac 1 6 $ & $6.25 \times \frac 1 6 $ \\ 2 & 3.5 & -1.5 & 2.25 & $\frac 1 6 $ & $2.25 \times \frac 1 6 $ \\ 3 & 3.5 & -0.5 & 0.25 & $\frac 1 6 $ & $0.25 \times \frac 1 6 $ \\ 4 & 3.5 & 0.5 & 0.25 & $\frac 1 6 $ & $0.25 \times \frac 1 6 $ \\ 5 & 3.5 & 1.5 & 2.25 & $\frac 1 6 $ & $2.25 \times \frac 1 6 $ \\ 6 & 3.5 & 2.5 & 6.25 & $\frac 1 6 $ & $6.25 \times \frac 1 6 $ \\ \hline & & & & Total: & $17.50 \times \frac 1 6 = 2.9167 $ \\ \hline \end{tabular}%
\end{center}
\caption{Computation of variance in the case of a single toss of a fair die.} \label{var1}
\end{table}
An alternative measure of the dispersion of a random variable is provide by the \textbf{standard deviation}. The population standard deviation of a random variable, $\sigma _{X}$, is defined as the square root of the population variance. In mathematical terms, the standard deviation can be expressed as:
\begin{equation*}
\sigma _{X}=\sqrt{\sigma _{X}^{2}}
\end{equation*}
As the variance of a distribution increases, so does its standard deviation.
One advantage of the standard deviation over the variance is that the standard deviation is measured in the same units as the original variable while the variance is measured in squared units of the variable. When variables are measured in monetary terms, the variance is measured in units of dollars$^{2}$ while the standard deviation is measured in dollars. It often seems more natural to deal with a measure of dispersion that is measured in the same units as the underlying variable.
\subsection{Properties of variance}
There are a few properties of the variance with which you should be familiar. These properties will be used often in later chapters of this text. \footnote{%
Proofs of these properties may be found in the mathematical appendix appearing at the end of this chapter.}
\begin{description}
\item[Property 1:] The variance of a random variable, $X$, (defined above as $\sigma _{X}^{2}=$$E(X-\mu _{X})^{2}$) may also be expressed as: $% E(X^{2})-\mu _{X}^{2}$.
\end{description}
This property provides an alternative method of computing the variance of a random variable with a known distribution function.
\begin{description}
\item[Property 2:] If $X$ is a random variable and $a$ is a constant, $% var(aX)=a^2var(X)$
\end{description}
This property states that the variance of a constant times a random variable is equal to the constant squared times the variance of the original variable. Suppose, for example, that a new variable, $Y$, is formed using the relationship:
\begin{equation*}
Y=5X
\end{equation*}
The variance of $Y$ can be computed as:
\begin{equation*}
var(Y)=var(5X)
\end{equation*}
\begin{equation*}
=5^2\cdot var(X)
\end{equation*}
\begin{equation*}
=25\cdot var(X)
\end{equation*}
\begin{description}
\item[Property 3:] If $X$ is a random variable and $c$ is a constant, $% var(X+c)=var(X).$
\end{description}
This property states that the variance of a random variable plus a constant equals the variance of the original random variable. Intuitively, the reason for this is that the variance is a measure of the dispersion of the random variable around the mean; adding a constant to a random variable shifts all of the values up or down by a given amount, but does not affect the dispersion around the mean (since the mean also shifts by the same amount).
As an example, of this property, note that: \begin{equation*}
var(X+8)=var(X)
\end{equation*}
\section{Other moments}
The mean and the variance are called the first and second “moments” of a PDF. Higher-order moments also exist. The third moment, $\mu _{3}=E(X-\mu _{X})^{3}$, serves as a measure of the symmetry of a distribution. If a distribution is symmetric, the tail of the PDF that lies below the mean is a mirror image of the side of the distribution above the mean. An asymmetric distribution is said to be skewed. The mean occurs at the center of a symmetric distribution. In this case, the third moment of the distribution equals zero. The normal distribution discussed later in this chapter is an example of a symmetric distribution. The \textbf{skewness} of a distribution is measured as:
\begin{equation*}
S=\mu _{3}/\left( \sigma ^{2}\right) ^{3/2}\text{.} \end{equation*}
The fourth moment, $\mu _{4}=E(X-\mu _{X})^{4}$, is a measure of \textbf{% kurtosis}. The kurtosis of a distribution is larger when the distribution has thicker tails. Kurtosis is usually measured as: \begin{equation*}
K=\mu _{4}/(\sigma ^{2})^{2}\text{.}
\end{equation*}
.
As will be seen in Chapter \ref{biv.hyp.chap}, estimated values of the skewness and kurtosis can be used to test whether a random variable follows a normal distribution. While the skewness and kurtosis coefficients may be used for tests of normality, they are not used very often for other purposes. Statisticians and econometricians rarely report these measures when listing descriptive statistics for a variable.
\section{Joint probability density function} The last few sections of this chapter have focused on the distribution of a single random variable. In econometrics, however, the focus is more often on the relationship among random variables. A \textbf{joint probability density function} (also known as a \textbf{multivariate probability density function}% ) provides a convenient tool for this purpose. If there are $n$ random variables $X_i$ $(i=1,n)$, the joint density function $f(X_1,X_2,…X_n)$ provides a measure of the probability of observing any given combination of $% X_1,X_2,…,X_n$. In other words:
\begin{equation*}
f(x_1,x_2,…x_n)=\text{Prob}(X_1=x_1,X_2=x_2,…,X_n=x_n) \end{equation*}
where the lower-case symbols $x_1,x_2,\ldots ,x_n$ are used to represent a specific combination of the random variables $X_1,X_2,\ldots ,X_n$.
The joint density function takes into account any relationships that may exist among the random variables $X_i$. For example, suppose that we examine the joint density function for the income and educational attainment of individuals. It is quite likely that there is some relationship between individual income and educational attainment. One of the major goals of econometric analysis is to characterize relationships such as this that exist among economic variables. The joint density function provides an important tool for this analysis.
\begin{table}[tbp]
\begin{center}
\begin{tabular}{|l|ccc|}
\hline
& \multicolumn{3}{|c|}{\textbf{Income}} \\ \cline{2-4} \textbf{Education} & \textbf{low} & \textbf{medium} & \textbf{high} \\ \hline $< $ H.S. degree & 0.07 & 0.02 & 0.01 \\
H.S. degree & 0.10 & 0.30 & 0.05 \\
1-3 years college & 0.05 & 0.13 & 0.07 \\
4+ years college & 0.02 & 0.09 & 0.09 \\ \hline \end{tabular}%
\end{center}
\caption{Joint PDF for income and educational attainment } \label{jpdf}
\end{table}
Table \ref{jpdf} contains a listing of a hypothetical joint PDF relating educational attainment and income. Since there are only two random variables in this distribution, this is called a \textbf{bivariate probability density function}. Since both of these variables are discrete random variables, the joint distribution is also discrete. In this example, there are three levels of income (low, medium and high), and four levels of education (less than high school, high school degree, 1-3 years of college, four or more years of college). Thus, there are 12 possible combinations of income and educational attainment. Each value in this table is the probability of observing a particular combination of the random variables $X$ (level of education) and $% Y$ (level of income). For example, this table indicates that if an individual is selected at random from this population, there is a 7\% probability of selecting an individual who has a low income and has not completed high school.
As in the case of a univariate PDF, a multivariate PDF must satisfy two properties:
\begin{enumerate}
\item The probability of observing any combination of the random variables is nonnegative (\textit{i.e.}, $f(x_1,x_2,…x_n)\geq 0$ for any $% (x_1,x_2,…x_n)$ ). This condition rules out the possibility of negative probabilities. Since the probability of an outcome is a measure of its relative frequency in the population, negative probabilities are nonsensical.
\item The sum of the probabilities for all possible outcomes must equal 1.
As noted above, this condition guarantees that one of the possible outcomes must occur. In the case of the bivariate distribution above, this property reduces to: \footnote{%
In the case of a continuous bivariate distribution PDF, this condition can be stated as:
\begin{equation*}
\int_{-\infty }^{\infty }\int_{-\infty }^{\infty }f(x,y)dxdy=1 \end{equation*}
All of the results discussed for discrete multivariate random variables have an analogous interpretation in the case of continuous multivariate distributions.}
\begin{equation*}
\sum_{i=1}^{4}\sum_{j=1}^{3}f(x_{i},y_{j})=1, \end{equation*}
\begin{equation*}
\text{where: }X\text{ = level of education, }Y\text{ = level of income.} \end{equation*}
\end{enumerate}
In this example, the first property requires that the probability must be greater than or equal to zero for each of the outcomes listed in Table \ref% {jpdf}. The second property requires that the sum of all of the elements in this table equals one. You can easily verify that these two properties are satisfied for the distribution appearing in Table \ref{jpdf}.
\subsection{Marginal distributions} If a joint probability density function is known, it is possible to recover the univariate probability density functions for each of the variables.
These univariate distributions are referred to as \textbf{marginal probability density functions (marginal PDFs).} The marginal probability density functions describe the probability distribution of a particular random variable $X_{i}$ without regard to the values taken on by the other random variables, $X_{j}$ ($j\neq i$). Let’s examine how the marginal PDFs can be determined using the joint PDF contained in Table~\ref{jpdf}.
Suppose that we wished to determine the marginal distribution for educational attainment. In this example, there are three possible levels of income that can be received by an individual who has completed a given level of education. In a sample of 100 individuals, we would expect to observe approximately:\footnote{%
Of course, you would not expect to observe this in each sample of 100 individuals. Instead, this mix reflects the proportion of the population in each category. If $n$ samples of 100 individuals are drawn from this population, the average proportion of individuals in each category would approach these values as $n$ tends towards infinity.} \begin{itemize}
\item 7 individuals who have not completed high school and have low incomes; \item 2 individuals who have not completed high school and have incomes in the middle portion of the income distribution.
\item 1 individual who has not completed high school and has a high income.
\end{itemize}
Thus, the total proportion of the population that has not completed high school is 10\%. This suggests that the probability of observing any given level of educational attainment can be formed by simply adding together the probabilities in the appropriate row of Table~\ref{jpdf}. More generally, in the case of a discrete bivariate density function for the random variables $% X $ and $Y$, the marginal PDF for variable $X$ can be computed as: \begin{equation*}
f(x_{i})=\sum_{j=1}^{m}f(x_{i},y_{j})\text{, where }m\text{ = number of possible values of }y_{j}\text{. } \end{equation*}
Using a similar argument, the marginal density function for income can be determined by adding together the probabilities in each column of Table~\ref% {jpdf}. In the case of a bivariate density function for $X$ and $Y$, the marginal PDF for $Y$ can be expressed as: \begin{equation*}
f(y_{j})=\sum_{i=1}^{n}f(x_{i},y_{j})\text{, where }n\text{ = number of possible values of }x_{i}\text{.} \end{equation*}
Tables~\ref{mpdf1} and \ref{mpdf2} contain listings of the marginal PDFs for educational attainment and income, respectively. Note that each of the marginal PDFs satisfy the two properties of probability density functions (% \textit{i.e}., the probability of each outcome is nonnegative and the sum of the probabilities equals one for each density function).
\begin{table}[tbp]
\begin{center}
\begin{tabular}{|lc|}
\hline
\textbf{Education} & \textbf{Probability} \\ \hline $< $ H.S. degree & 0.10 \\
H.S. degree & 0.45 \\
1-3 years college & 0.25 \\
4+ years college & 0.20 \\ \hline Total & 1.00 \\ \hline
\end{tabular}%
\end{center}
\caption{Marginal PDF for educational attainment } \label{mpdf1}
\end{table}
\begin{table}[tbp]
\begin{center}
\begin{tabular}{|lc|}
\hline
\textbf{Income} & \textbf{Probability} \\ \hline low & 0.24 \\
medium & 0.54 \\
high & 0.22 \\ \hline
Total & 1.00 \\ \hline
\end{tabular}%
\end{center}
\caption{Marginal PDF for income } \label{mpdf2}
\end{table}
If we are trying to analyze the relationship between income and educational attainment that occurs in a population, we would rely on the joint density function. If, however, we are focusing our analysis solely on the distribution of one of these variables, we would use the marginal density function. For example, if we are interested in the proportion of the population that has a low income, the marginal PDF allows us to determine that 24\% of the population falls into this category in this hypothetical population.
\subsection{Conditional probability distribution functions} In economics, we often rely on the \textit{ceteris paribus }assumption to simplify our analysis. When this assumption is invoked, we focus on the relationship between two (or more) variables, holding constant the effect of other variables. In econometrics we are often interested in the distribution of a particular random variable, holding other variables constant. The \textbf{conditional probability distribution function} provides a tool that is analogous to the \textit{ceteris paribus} assumption. The conditional probability distribution function is the PDF that occurs for a variable when other random variables are held constant at a particular level (or within a particular interval). In the bivariate case, there are two possible conditional PDFs:
\begin{enumerate}
\item The conditional distribution of $Y$ for a given value (or interval) of $X$. This is denoted by the function, $f(Y|X)$. In the example provided in Table \ref{jpdf}, a possible example is the conditional distribution of income given that the individual has not completed high school. This can be stated as: $f(Y|X=$ “\TEXTsymbol{<} H.S. degree”) (where $Y$ = level of income and $X$ = level of educational attainment).
\item The conditional distribution of $X$ for a given value (or interval) of $Y$, expressed as: $f(X|Y)$.
\end{enumerate}
Let’s examine how conditional probabilities can be computed. In particular, let’s investigate the conditional PDF of income given that an individual does not complete high school. In a sample of 100 individuals, we would expect to see approximately 10 individuals who have not completed high school. As noted above, in this group of 10 individuals, it is expected that approximately:\footnote{%
Once again, it should be noted that this only holds for an “average”
sample consisting of 10 individuals drawn from this population.} \begin{itemize}
\item 7 individuals will have low incomes; \item 2 individuals will have medium incomes; and \item 1 individual will have a high income.
\end{itemize}
Thus, 70\% of the subpopulation consisting of individuals who have not completed high school will be expected to have low incomes, 20\% will be expected to have medium incomes, and 10\% will have high incomes. Thus, the conditional probabilities of observing low, medium, or high incomes are 0.7, 0.2, and 0.1, respectively. More generally, the conditional PDFs may be computed as:
\begin{equation*}
f(X|Y)=\frac{f(X,Y)}{f(Y)}
\end{equation*}
\begin{equation*}
=\frac{\text{probability of observing }x_{i}\text{ and }y_{j}}{\text{% probability of observing }y_{j}}
\end{equation*}
\begin{equation*}
=\frac{\text{joint probability of }x_{i}\text{ and }y_{j}}{\text{marginal probability of }y_{j}}
\end{equation*}
and
\begin{equation*}
f(Y|X)=\frac{f(X,Y)}{f(X)}
\end{equation*}
\begin{equation*}
=\frac{\text{probability of observing }x_{i}\text{ and }y_{j}}{\text{% probability of observing }x_{i}}
\end{equation*}
\begin{equation*}
=\frac{\text{joint probability of }x_{i}\text{ and }y_{j}}{\text{marginal probability of }X_{i}}
\end{equation*}
Table \ref{cpdf} contains a listing of the conditional probability distributions for income at alternative levels of educational attainment. In this table, the educational outcomes are represented using numerical values rather than labels due to space considerations ($X=1$ if individual has not completed high school, $X=2$ if the highest level of education is a high school degree, $X=3$ if the highest level of education is 1-3 years of college, and $X=4$ if the individual has 4 or more years of college).
\begin{table}[tbp]
\begin{center}
\begin{tabular}{|l|cccc|}
\hline
& \multicolumn{4}{|c|}{\textbf{Conditional PDF for income} \boldmath $(Y) $} \\
& \multicolumn{4}{|c|}{\textbf{given educational attainment} \boldmath $(X) $% .} \\ \cline{2-5}
\textbf{Income} & \textbf{\ $f(Y|X=1) $ } & \textbf{\ $f(Y|X=2) $ } & \textbf{\ $f(Y|X=3) $ } & \textbf{\ $f(Y|X=4) $ } \\ \hline low & 0.70 & 0.22 & 0.20 & 0.10 \\ medium & 0.20 & 0.67 & 0.52 & 0.45 \\ high & 0.10 & 0.11 & 0.28 & 0.45 \\ \hline \textbf{Total} & 1.00 & 1.00 & 1.00 & 1.00 \\ \hline \end{tabular}%
\end{center}
\caption{Conditional PDF for income given educational attainment } \label{cpdf}
\end{table}
Since a conditional probability distribution is a PDF, each possible outcome has a nonnegative probability and the sum of the probabilities for all possible outcomes equals one (as can be seen in Table~\ref{cpdf}). The expected value of a conditional bivariate distribution can be measured as: \begin{equation*}
E(Y|X=x_{o})=\sum_{j=1}^{m}y_{j}f(y_{j}|X=x_{o}), \end{equation*}
or:
\begin{equation*}
E(X|Y=y_{o})=\sum_{i=1}^{n}x_{i}f(x_{i}|Y=y_{o}) \end{equation*}
\begin{equation*}
\text{(where }x_{o}\text{ and }y_{o}\text{ are specific values of the random variables }X\text{ and }Y\text{).} \end{equation*}
As in the case of simple univariate PDFs, the expected value of a variable is simply a weighted average of all possible outcomes. The weight assigned to each outcome is the probability of observing it.
\subsection{Statistical Independence} As noted above, two random variables are said to be \textbf{statistically independent} if the distribution of each variable is unaffected by the level of the other variable. In this case, the conditional probability density function for each variable is the same as the corresponding marginal density function. Alternatively, we can note that two variables are statistically independent if and only if the joint density function can be written as the product of the marginal PDFs. To see this, note that, by the definition of the conditional probability of $X$ given $Y,$% \begin{equation*}
\frac{f(X,Y)}{f(Y)}=f(X|Y)
\end{equation*}
Multiplying both sides of this expression by $f(Y)$ results in: \begin{equation}
f(X,Y)=f(X|Y)f(Y) \label{joint.as.prod.sc} \end{equation}
If $X$ and $Y$ are statistically independent, $f(X|Y)=f(X)$. Thus, equation % \ref{joint.as.prod.sc} can be restated as: \begin{equation*}
f(X,Y)=f(X)f(Y)
\end{equation*}
Suppose that a single die is rolled twice and $X$ and $Y$ are the outcomes in the first and second rolls, respectively. Since the outcomes on the first and second rolls of the die are statistically independent, the probability of observing any pair of outcomes ($x_{i},y_{j}$) is simply equal to the product of the probabilities of observing the separate outcomes $x_{i}$ and $% y_{j}$ ($=\frac{1}{6}\times \frac{1}{6}=\frac{1}{36}$).% \exbox{The “Law of Averages”}{In games of chance, non-statisticians often base their decisions on the “law of averages.” Suppose an individual is playing a game of roulette and has observed that a particular outcome has not won in a long period of time. He or she may choose to bet on this number on the grounds that the number must win a fixed proportion of the time in a large number of repeated trials. Thus, if that number has not been selected recently, the “law of averages” suggests that it is more likely to win than other numbers that have been selected in the recent past.
This argument, however, is invalid. A statistician would recognize that these consecutive outcomes are statistically independent. The probability of each outcome occurring is exactly the same in each trial. Past outcomes have no effect on the probability of current outcomes.} \subsection{Covariance and correlation} The \textbf{covariance} between any two random variables is a measure of the relationship that exists between the two variables. The covariance between two random variables, $X$ and $Y$, is measured as: \begin{equation*}
cov(X,Y)=E\left[ \left( X-\mu _{X}\right) \left( Y-\mu _{Y}\right) \right] \end{equation*}
This covariance is also commonly denoted by the symbol $\sigma _{XY}$. (Note that the variance is a special case of the covariance in which $X$ and $Y$ are the same variables.) In the case of discrete joint PDFs, the covariance is computed as:\footnote{%
In the case of continuous distributions, the covariance equals: \begin{equation*}
\int_{-\infty }^{\infty }\int_{-\infty }^{\infty }\left( X-\mu _{X}\right) \left( Y-\mu _{Y}\right) f(X,Y)dXdY \end{equation*}%
}
\begin{equation*}
cov(X,Y)=\sum_{i=1}^{n}\sum_{j=1}^{m}\left( x_{i}-\mu _{X}\right) \left( y_{j}-\mu _{Y}\right) f(x_{i},y_{j}) \end{equation*}
The sign of the covariance provides an indication of the nature of the relationship between $X$ and $Y$. In the example in Table~\ref{jpdf}, it can be seen that individuals with higher levels of educational attainment have, on average, higher incomes (i.e., $E(Y|X)$ increases as $X$ increases). In this case, when one random variable takes on relatively high values, so will the other. If an increase in one random variable tends to be associated with an increase in another random variable, then when one variable is above its population mean, so will the other (on average). Similarly, if one variable is below its population mean, the other variable is also more likely to be below its mean. In this situation, the more likely outcomes involve terms in which there is either a product of two positive numbers or a product of two negative numbers. Therefore, the covariance will tend to be positive when there is a generally direct relationship between two variables. A similar argument indicates that the covariance will be negative when there is an inverse pattern to the relationship between two variables.
Figure~\ref{cov_x_y} contains scatterplots of samples drawn from two different joint PDFs. There is a positive covariance between the random variables $X$ and $Y$ in the distribution illustrated on the left-hand side of Figure~\ref{cov_x_y}. Notice that the most frequent outcomes are those in which the terms $(X-\mu _{X})$ and $(Y-\mu _{Y})$ have the same sign. As the right-side diagram in Figure~\ref{cov_x_y} suggests, the terms $(X-\mu _{X})$ and $(Y-\mu _{Y})$ are more likely to have opposite signs when there is a negative covariance between $X$ and $Y$.
\FRAME{ftbpFU}{5.047in}{2.8669in}{0pt}{\Qcb{Covariance between X and Y}}{% \Qlb{cov_x_y}}{fig2-7.gif}{\special{language “Scientific Word”;type “GRAPHIC”;maintain-aspect-ratio TRUE;display “USEDEF”;valid_file “F”;width 5.047in;height 2.8669in;depth 0pt;original-width 5.3956in;original-height 3.0519in;cropleft “0”;croptop “1”;cropright “1”;cropbottom “0”;filename ‘graphs/Fig2-7.gif’;file-properties “XNPEU”;}} If two variables are independent, their covariance will equal zero. A covariance of zero, however, does not indicate that two variables are independent.\footnote{%
In the special case of a multivariate normal density function, however, a covariance of zero does imply that the variables are independent.} Figure~% \ref{zero_cov} contains a graph of a relationship in which $E(Y|X)$ changes as the level of $X$ changes. The covariance between $X$ and $Y$ in this example is zero because the positive and negative terms cancel out. To see this, note that for every outcome in which the product between $X-\mu _{X}$ and $Y-\mu _{Y}$ is positive there is a corresponding outcome on the other side of the diagram in which this product is negative (and of equal magnitude). A covariance of zero, however, does indicate that there is no linear relationship between two variables.
\begin{center}
\FRAME{ftbpFU}{3.9029in}{2.8029in}{0pt}{\Qcb{Zero covariance between X and Y}% }{\Qlb{zero_cov}}{fig2-8.gif}{\special{language “Scientific Word”;type “GRAPHIC”;maintain-aspect-ratio TRUE;display “USEDEF”;valid_file “F”;width 3.9029in;height 2.8029in;depth 0pt;original-width 3.8545in;original-height 2.7605in;cropleft “0”;croptop “1”;cropright “1”;cropbottom “0”;filename ‘graphs/Fig2-8.gif’;file-properties “XNPEU”;}} \end{center}
The covariance between two variables is affected by the units in which the variables are measured. Suppose that we are trying to measure the covariance between consumption expenditures and disposable income. If both variables are expressed in terms of millions of dollars, the covariance would be a relatively large number. If the variables are expressed in terms of billions of dollars, however, the covariance will be much smaller (since all of the deviations will be 1/1000 as large as in the first case). It would be useful to have a measure of the degree of linear association between two variables that is not affected by the magnitude of the variables being discussed.% \footnote{%
Another advantage of the correlation coefficient is that this provides a unit-free measure of the degree of statistical association between variables. The covariance is often measured in terms of rather strange units such as dollars$^{2}$, the correlation coefficient is a standardized measure that is unit free.} The \textbf{correlation coefficient ($\rho $}) provides such a measure. It is defined as: \begin{equation*}
\rho _{XY}=\frac{cov(X,Y)}{\sigma _{X}\sigma _{Y}} \end{equation*}%
The correlation coefficient will always have the same sign as the covariance since the standard deviations of $X$ and $Y$ ($\sigma _{X}$ and $\sigma _{Y}$% ) are both positive. The value of the correlation coefficient will always fall between -1 and 1. The correlation coefficient will equal 1 only if there is an exact positively-sloped linear relationship between the variables; a correlation coefficient of -1 occurs if there is an exact negatively-sloped linear relationship between the variables. These possibilities are illustrated in Figure~\ref{rho_graph}. The absolute value of the correlation coefficient provides a measure of the degree of linear association between two variables. If the correlation coefficient is zero, there is no linear relationship between the variables; if the absolute value of the correlation coefficient equals one, then there is an exact linear relationship between the variables.
When two variables are independent, the correlation coefficient will equal zero (since the covariance is zero). A correlation coefficient of zero between two variables, however, does not necessarily imply that the variables are independent. A correlation coefficient equal to zero simply indicates that there is no linear relationship between these variables.
\begin{center}
\FRAME{ftbpFU}{5.047in}{2.6394in}{0pt}{\Qcb{$\protect\rho =1$ and $\protect% \rho =-1$}}{\Qlb{rho_graph}}{fig2-9.gif}{\special{language “Scientific Word”;type “GRAPHIC”;maintain-aspect-ratio TRUE;display “USEDEF”;valid_file “F”;width 5.047in;height 2.6394in;depth 0pt;original-width 5.3644in;original-height 2.7916in;cropleft “0”;croptop “1”;cropright “1”;cropbottom “0”;filename ‘graphs/Fig2-9.gif’;file-properties “XNPEU”;}} \end{center}
\subsection{Variance of sums or differences of random variables} In econometrics, it is often necessary to work with the sums or differences of random variables. The expected value of sums or differences of random variables was introduced in Section \ref{expect.sc.marker}. It will be helpful to examine the variance of these linear combinations of random variables. The most important properties involving the variance of sums or differences of random variables are:\footnote{% A proof of these properties appears in the mathematical appendix at the end of this chapter.}
\begin{description}
\item[Property 1:] If $X$ and $Y$ are two random variables, $% var(X+Y)=var(X)+var(Y)+2cov(X,Y).$ \end{description}
This property states that the variance of the sum of two random variables equals the sum of the variances plus twice the covariance. Suppose, for example, that the variance of a variable $X$ is equal to 30, the variance of $Y$ equals 40, and the covariance equals 3. The variance of a new variable created by adding $X$ and $Y$ can be computed as: \begin{equation*}
var(X+Y)=30+40+2(3)
\end{equation*}
\begin{equation*}
=76
\end{equation*}
\begin{description}
\item[Property 2:] If $X$ and $Y$ are two random variables, $% var(X-Y)=var(X)+var(Y)-2cov(X,Y)$ \end{description}
This property states that the variance of the difference between two random variables equals the sum of the variances minus twice the covariance between the two variables. Suppose, once again, that the variance of $X$ equals 30, the variance of $Y$ equals 40, and the covariance equals 3. In this case, the variance of $X-Y$ is given by: \begin{equation*}
var(X-Y)=30+40-2(3)
\end{equation*}
\begin{equation*}
=64
\end{equation*}
\begin{description}
\item[Property 3:] If $X$ and $Y$ are independent random variables, $% var(X\pm Y)=var(X)+var(Y)$
\end{description}
This property states that the variance of the sum or difference of two independent random variables is equal to the sum of the variances of these variables. Note that this is a special case of properties 1 and 2 above (since $cov(X,Y)=0$ when $X$ and $Y$ are independent).
\begin{description}
\item[Property 4:] If $X_{1},X_{2},\ldots ,X_{n}$ are independent random variables and $a_{1},a_{2},\ldots ,a_{n}$ are constants, then \begin{equation*}
var(a_{1}X_{1}+a_{2}X_{2}+\cdots +a_{n}X_{n})=\sum_{i=1}^{n}a_{i}^{2}\cdot var(X_{i})
\end{equation*}
This property indicates that the variance of a weighted sum of independent random variables\footnote{$X_{1},X_{2},\ldots ,X_{n}$ are independent random variables if the marginal distribution of each of the $X_{i}$’s is unaffected by the realizations for the other variables.} will be a weighted sum of the variances (where these weights in this second sum are simply the squared values of the original weights). For example, suppose that a random variable $Z$ is defined as a weighted sum of the independent random variables $X$ and $Y$ according to the formula: \begin{equation*}
Z=3X+5Y
\end{equation*}
In this case, the variance of $Z$ can be computed as: \begin{equation*}
var(3X+5Y)=\left( 3^{2}\right) var(X)+\left( 5^{2}\right) var(Y) \end{equation*}
\begin{equation*}
=9\cdot var(X)+25\cdot var(Y)
\end{equation*}
\end{description}
\section{ Summary}
This chapter has provided a review of fundamental statistical concepts that are extensively used in later chapters of this text. The chapter began with a discussion of the probability density function (PDF) and the cumulative density function (CDF) for discrete and continuous random variables. The PDF provides the probability of observing a specific outcome and the CDF may be used to determine the probability that an outcome falls within any interval.
Each probability density function is characterized by one or more population parameters. These parameters include the population mean and variance.
The expected value provides a measure of the “average” value of a random variable. Several important properties of expectations were discussed in this chapter. These properties will be used in several derivations appearing in later chapters of this text.
Most econometric studies focus on the relationships among several random variables. The relationship between these variables can be described using a joint probability density function. This joint PDF provides the probability of observing alternative combinations of the random variables. The marginal PDFs (the univariate PDFs corresponding to a joint PDF) can be easily recovered from the joint PDF. The conditional PDF describes the distribution of a random variable that occurs when the level of one or more other random variables are held constant.
If two variables are statistically independent, the distribution of one is not affected by the level of the other. When statistical independence does not hold, however, the covariance and correlation between the variables provides useful information about the nature of the relationship between the variables. A positive covariance (and correlation) indicates that an increase in one variable tends to be associated with a decrease in the other. A negative covariance (and correlation), on the other hand, indicates that an inverse relationship tends to exist between these variables.
Some of the most important properties of expectations appear in Section \ref% {expect.sc.marker}. These properties will be used in several proofs appearing in later chapters.
\section{Key Concepts}
random variable
discrete random variable
continuous random variable
probability density function (PDF) cumulative density function (CDF) population
sample
population parameters
random sample
expected value
population mean
population variance
population standard deviation
law of large numbers
skewness
kurtosis
joint probability density function bivariate probability density function marginal probability density function conditional probability density function independence
covariance
correlation coefficient
\newpage\
\section{Exercises and Problems}
\begin{enumerate}
\item In each of the following cases, determine whether the random variable is continuous or discrete.
\begin{enumerate}
\item An individual swimmer’s time in a race.
\item An individual’s letter grade on an econometrics paper.
\item The number of cars owned by an individual.
\item An individual’s SAT\ score.
\item A household’s average weekly income.
\end{enumerate}
\item Define a random variable, $X$, equal to the number of children per adult married female of age 45 years or older. A (hypothetical) partial listing of the PDF for this random variable is provided in Table~\ref% {childpdf}.
\begin{table}[tbp]
\begin{center}
\begin{tabular}{|cc|}
\hline
\textbf{Number of children} & \textbf{Probability} \\ \hline 0 & 0.10 \\
1 & 0.25 \\
2 & 0.45 \\ \hline
\end{tabular}%
\end{center}
\caption{PDF for number of children per married female aged 45+ } \label{childpdf}
\end{table}
Can you determine the probability of having 3 or more children? If so, compute this probability. If not, explain why not.
\item Table \ref{xcdf.stat.chap} lists selected values of the CDF for a continuous random variable $X$:%
%TCIMACRO{\TeXButton{B}{\begin{table}[h] \centering}}% %BeginExpansion
\begin{table}[h] \centering%
%EndExpansion
\begin{tabular}{|l|l|}
\hline
$\mathbf{X}$ & $\mathbf{F(X)}$ \\ \hline 5 & 0.06 \\
10 & 0.25 \\
15 & 0.60 \\
20 & 0.85 \\ \hline
\end{tabular}
\caption{CDF for $X$\label{xcdf.stat.chap}}% %TCIMACRO{\TeXButton{E}{\end{table}}}% %BeginExpansion
\end{table}%
%EndExpansion
{} Determine the probability of observing a value of $X$ that is: \begin{enumerate}
\item less than or equal to 5.
\item greater than 5.
\item greater than 20.
\item between 5 and 10.
\item between 10 and 20.
\end{enumerate}
\item Table \ref{CDF.prob.stat} lists selected values of the cumulative density function for the level of annual income in a given population:% %TCIMACRO{\TeXButton{B}{\begin{table}[h] \centering}}% %BeginExpansion
\begin{table}[h] \centering%
%EndExpansion
\begin{tabular}{|c|c|}
\hline
\textbf{Annual Income} & \textbf{Cumulative probability} \\ \hline \$20,000 & 0.12 \\
40,000 & 0.35 \\
60,000 & 0.60 \\
80,000 & 0.75 \\
100,000 & 0.85 \\
120,000 & 0.90 \\
140,000 & 0.93 \\ \hline
\end{tabular}
\caption{CDF for the level of annual income\label{CDF.prob.stat}}% %TCIMACRO{\TeXButton{E}{\end{table}} }% %BeginExpansion
\end{table}
%EndExpansion
Determine the probability of observing a household in which the level of income is:
\begin{enumerate}
\item less than or equal to \$20,000.
\item greater than \$140,000.
\item greater than \$80,000
\item between \$40,000 and \$60,000.
\item between \$60,000 and \$120,000.
\end{enumerate}
\item A single coin is tossed three times. Let $X$ = \# of heads.
\begin{enumerate}
\item Construct a table containing the PDF for this random variable.
\item Construct a table containing the CDF for this random variable.
\item Compute $E(X)$.
\item Compute $var(X)$.
\end{enumerate}
\item Suppose a random variable is defined as the square of the outcome on a single toss if a fair die. If $X$ equals the outcome on a single toss of a die, this new variable equals $X^{2}$. Determine $E(X^{2})$. Is this equal to $E(X)^{2}?$
\item Suppose that a game promises to pay you \$10 $\times $ value shown on a toss of a single die. For example, if you roll a 5, you will receive \$50.
\begin{enumerate}
\item What is the expected value of the outcome of this game?
\item What is the variance of this outcome?
\item Use your results from (a) and (b) and the properties of expectation and variance to determine the expected value and variance of a game in which the payment equals \$20 $\times $ the outcome on a single roll of a fair die.
\end{enumerate}
\item Suppose that $X$ denotes the outcome on a single toss of a coin ($X=1$ if “heads” and $X=0$ if “tails”). A new random variable is formed as: \begin{equation*}
Y=X+10
\end{equation*}
\begin{enumerate}
\item Determine the expected value of $X$ and $Y.$ \item Determine the variance of $X$ and $Y$.
\end{enumerate}
\item Table \ref{jpdf2} lists the joint PDF for the \# of cars owned by a household and the level of household income in a hypothetical population.
\begin{table}[tbp]
\begin{center}
\begin{tabular}{|c|ccc|}
\hline
& \multicolumn{3}{|c|}{\textbf{Income}} \\ \cline{2-4} \textbf{Number of cars} & \textbf{low} & \textbf{medium} & \textbf{high} \\ \hline
0 & 0.15 & 0.04 & 0.01 \\
1 & 0.10 & 0.15 & 0.10 \\
2 & 0.01 & 0.16 & 0.20 \\
3 & 0.00 & 0.02 & 0.06 \\ \hline
\end{tabular}%
\end{center}
\caption{Joint PDF for the number of cars owned and the level of household income }
\label{jpdf2}
\end{table}
\begin{enumerate}
\item Determine the marginal PDF for the \# of cars owned by the household.
\item Determine the marginal PDF for household income.
\item Determine the conditional distribution for the number of cars owned given that the household has a medium level of income.
\item Determine the conditional expectation of the number of cars owned by households given that the household has a medium level of income.
\item Determine the conditional expectation of the number of cars owned by households given that the household has a low level of income.
\end{enumerate}
\item Use Table \ref{jpdf} to determine the conditional PDF for educational attainment given that the level of household income is “low.”
\item Suppose that $X$ and $Y$ are both normally distributed random variables. $\mu _{X}=20,$ $\mu _{Y}=12,$ $\sigma _{X}^{2}=2,$ $\sigma _{Y}^{2}=3$ and $\sigma _{XY}=1$. A new variable is created that is equal to the sum of $X$ and $Y$. Determine the population mean and variance for this new variable.
\item The variables $X$ and $Y$ are distributed as independent random normal variables. The population means of $X$ and $Y$ equal 3 and 4 respectively.
The variance of $X$ is 2 and the variance of $Y$ is 5. A new variable $Z$ is created according to the formula: \begin{equation*}
Z=3X+10Y+40
\end{equation*}
Determine the population mean and variance for Z.
\item \label{o.j.prob.stat}The variables $X$ and $Y$ are distributed as independent random normal variables. The population means of $X$ and $Y$ equal 1.0 and 0.5 respectively. The variance of $X$ is 0.2 and the variance of $Y$ is 0.1. A new variable $Z$ is created according to the formula: \begin{equation*}
Z=10X+20Y
\end{equation*}
\begin{enumerate}
\item Determine the population mean for $Z$.
\item Determine the variance of $Z$.
\end{enumerate}
\item \textit{(More difficult problem)} The owner of a citrus grove decides to ship fruit baskets. Each package consists of 10 grapefruit and 20 oranges. The weight of a grapefruit is a random variable that is distributed normally with a mean of 1 pound and a variance of 0.2 pounds; the weight of an orange is a random variable that is distributed normally with a mean of 0.5 pounds and a variance of 0.1 pounds. The covariance is zero between the weight of the individual oranges and grapefruit packaged in each container. (% \textit{Hint:} Note that the individual oranges (and grapefruit) may differ in weight within each basket. Be sure to take this into account when determining your answers.)
\begin{enumerate}
\item Determine the mean weight of the contents of each package.
\item Determine the variance of the weight of the contents of each package. (% \textit{Hint:} The variance of a sum of independent random variables equals the sum of the variances of these variables.) \item Why is the variance in this problem different from that computed in question \ref{o.j.prob.stat}? (\textit{Note:} If your answers are the same in both cases, one of these answers is incorrect!) \end{enumerate}
\item A genetic counselor informs a couple that there is a twenty percent probability of having a child who possesses blue eyes, a seventy-five percent probability of having a brown-eyed child and a five percent probability of having a child with green eyes. There is a fifty percent probability that the child will be a boy and a fifty percent probability that the child will be a girl. Determine the joint density function for the child’s gender and eye color under the assumption that gender and eye color are independent. Does this density function satisfy the two conditions required of joint PDFs?
\item Consider the value of a random variable, $X$, that has a mean of 100 and a variance of 1600. Let $Y=\frac X2+20$. Use the properties of expectations to determine the mean and variance for $Y$.
\end{enumerate}
\newpage\
\section{Mathematical Appendix}
\subsection{Properties of Summations} The summation operator is used to express the addition of a set of terms that can be expressed using a common formula. Examples of the use of the summation operator appear below:
\begin{equation*}
\sum_{i=1}^3X_i=X_1+X_2+X_3
\end{equation*}
\begin{equation*}
\sum_{t=0}^TX_t=X_0+X_1+X_2+\cdots +X_T \end{equation*}
\begin{equation*}
\sum_{i=1}^4a_iX^i=a_1X+a_2X^2+a_3X^3+a_4X^4 \end{equation*}
Sometimes, the summation operator is doubled. For example: \begin{equation*}
\sum_{i=1}^N\sum_{j=1}^MX_{ij}
\end{equation*}
This means that the summation takes place over all possible combinations of $% i$ and $j$ for which $i$ lies between $1$ and $N$ and $j$ lies between $1$ and $M$. Consider, for example,
\begin{equation*}
\sum_{i=1}^2\sum_{j=1}^3X_{ij}
\end{equation*}
This can be expressed as:
\begin{equation*}
\sum_{i=1}^2\sum_{j=1}^3X_{ij}=\sum_{i=1}^2\left( \sum_{j=1}^3X_{ij}\right) \end{equation*}
\begin{equation*}
=\sum_{i=1}^2\left( X_{i1}+X_{i2}+X_{i3}\right) \end{equation*}
\begin{equation*}
=X_{11}+X_{12}+X_{13}+X_{21}+X_{22}+X_{23} \end{equation*}
There are a few properties of summations that are needed for later derivations:
\textbf{Property 1:} If $c$ is a constant, $\sum_{i=1}^ncX_i=c% \sum_{i=1}^nX_i $
\textbf{Property 2:} $\sum_{i=1}^n(X_i+Y_i)=\sum_{i=1}^nX_i+\sum_{i=1}^nY_i$ \textbf{Property 3:} If $a$ and $b$ are constants, $% \sum_{i=1}^n(aX_i+bY_i)=a\sum_{i=1}^nX_i+b\sum_{i=1}^nY_i$ \textbf{Property 4:} If $k$ is a constant: $\sum_{i=1}^nk=nk$ \textbf{Property 5:} $\sum_{i=1}^n\sum_{j=1}^mX_iY_j=(\sum_{i=1}^nX_i)(% \sum_{j=1}^mY_j)$
\textbf{Property 6:} $\sum (X_i-\overline{X})=0$. (where $\overline{X}=\sum X_i/n$)
\textbf{Property 7:} $\sum_{i=1}^n(X_i-\overline{X})^2=\sum_{i=1}^nX_i(X_i-% \overline{X})$
\textbf{Property 8:} $\sum_{i=1}^n(X_i-\overline{X})(Y_i-\overline{Y}% )=\sum_{i=1}^nX_i(Y_i-\overline{Y})=\sum_{i=1}^nY_i(X_i-\overline{X})$ \subsubsection{Derivation of properties} \textbf{Property 1:} If $c$ is a constant, $\sum_{i=1}^ncX_i=c% \sum_{i=1}^nX_i $
\textbf{Proof:}
\begin{equation*}
\sum_{i=1}^ncX_i=cX_1+cX_2+…+cX_n \end{equation*}
Using the distributive law of multiplication over addition: \begin{equation*}
=c(X_1+X_2+…+X_n)
\end{equation*}
\begin{equation*}
=c\sum_{i=1}^nX_i
\end{equation*}
\textbf{Property 2:} $\sum_{i=1}^n(X_i+Y_i)=\sum_{i=1}^nX_i+\sum_{i=1}^nY_i$ \textbf{Proof:}
\begin{equation*}
\sum_{i=1}^n(X_i+Y_i)=(X_1+Y_1)+(X_2+Y_2)+…+(X_n+Y_n) \end{equation*}
Applying the associative law of addition: \begin{equation*}
=(X_1+X_2+…+X_n)+(Y_1+Y_2+…+Y_n) \end{equation*}
\begin{equation*}
=\sum_{i=1}^nX_i+\sum_{i=1}^nY_i
\end{equation*}
\textbf{Property 3:} If $a$ and $b$ are constants, $% \sum_{i=1}^n(aX_i+bY_i)=a\sum_{i=1}^nX_i+b\sum_{i=1}^nY_i$ \textbf{Proof:}
The proof follows directly from an application of Properties (1) and (2).
\textbf{Property 4:} If $k$ is a constant: $\sum_{i=1}^nk=nk$ \textbf{Proof:}
\begin{equation*}
\sum_{i=1}^{n}k=(k+k+\ldots +k)
\end{equation*}
Since there are $n$ terms in this summation, this can be restated as: \begin{equation*}
\sum_{i=1}^{n}k=nk
\end{equation*}
\textbf{Property 5:} $\sum_{i=1}^{n}\sum_{j=1}^{m}X_{i}Y_{j}=(% \sum_{i=1}^{n}X_{i})(\sum_{j=1}^{m}Y_{j})$ \textbf{Proof:}
Using the definition of a double summation, \begin{equation*}
\sum_{i=1}^n\sum_{j=1}^mX_iY_j=\sum_{i=1}^n(X_iY_1+X_iY_2+…+X_iY_m) \end{equation*}
\begin{equation*}
=\sum_{i=1}^nX_i(Y_1+Y_2+…+Y_m) \end{equation*}
By the definition of the summation operator (and applying the distributive law):
\begin{equation*}
=(X_1+X_2+…+X_n)(Y_1+Y_2+…+Y_m) \end{equation*}
\begin{equation*}
=(\sum_{i=1}^nX_i)(\sum_{j=1}^mY_j) \end{equation*}
\textbf{Property 6:} $\sum (X_i-\overline{X})=0$. (where $\overline{X}=\sum X_i/n$)
\textbf{Proof:}
By Property 2,
\begin{equation*}
\sum_{i=1}^n(X_i-\overline{X})=\sum_{i=1}^nX_i-\sum_{i=1}^n\overline{X} \end{equation*}
By Property 4,
\begin{equation} \label{wtr.z3}
\sum_{i=1}^n(X_i-\overline{X})=\sum_{i=1}^nX_i-n\overline{X} \end{equation}
Since the sample mean, $\overline{X}$ is defined as: $\overline{X}=\frac 1n\sum_{i=1}^nX_i:$%

\begin{equation*}
\sum_{i=1}^nX_i=n\overline{X} \end{equation*}
.
Thus, equation \ref{wtr.z3} can be stated as: \begin{equation*}
\sum_{i=1}^n(X_i-\overline{X})=n\overline{X}-n\overline{X}=0 \end{equation*}
\textbf{Property 7:} $\sum_{i=1}^n(X_i-\overline{X})^2=\sum_{i=1}^nX_i(X_i\overline{X})$
\textbf{Proof:}
\begin{equation*}
\sum_{i=1}^n(X_i-\overline{X})^2=\sum_{i=1}^n(X_i-\overline{X})(X_i-% \overline{X})
\end{equation*}
By Properties 1 and 2, \begin{equation*}
=\sum_{i=1}^nX_i(X_i-\overline{X})-\overline{X}\sum_{i=1}^n(X_i-\overline{X}) \end{equation*}
By Property 6,
\begin{equation*}
=\sum_{i=1}^nX_i(X_i-\overline{X})-\overline{X}(0) \end{equation*}
\begin{equation*}
=\sum_{i=1}^nX_i(X_i-\overline{X}) \end{equation*}
\textbf{Property 8:} $\sum_{i=1}^n(X_i-\overline{X})(Y_i-\overline{Y}% )=\sum_{i=1}^nX_i(Y_i-\overline{Y})=\sum_{i=1}^nY_i(X_i-\overline{X})$ \textbf{Proof:}
Using the distributive law: \textbf{%
\begin{equation*}
\sum_{i=1}^n(X_i-\overline{X})(Y_i-\overline{Y})=\sum_{i=1}^n\left[ X_i(Y_i-% \overline{Y})-\overline{X}(Y_i-\overline{Y})\right] \end{equation*}
}
Applying Property 2, \begin{equation*}
=\sum_{i=1}^nX_i(Y_i-\overline{Y})-\overline{X}\sum_{i=1}^n(Y_i-\overline{Y}) \end{equation*}
By Property 6,
\begin{equation*}
=\sum_{i=1}^nX_i(Y_i-\overline{Y})-\overline{X}(0) \end{equation*}
\begin{equation*}
=\sum_{i=1}^nX_i(Y_i-\overline{Y}) \end{equation*}
In a similar manner, \begin{equation*}
\sum_{i=1}^n(X_i-\overline{X})(Y_i-\overline{Y})=\sum_{i=1}^nY_i(X_i\overline{X})-\overline{Y}\sum_{i=1}^n(X_i-\overline{X}) \end{equation*}
\begin{equation*}
=\sum_{i=1}^nY_i(X_i-\overline{X}) \end{equation*}
\subsection{Properties of Expectations} \textbf{Property 1:} If $c$ is a constant, $E(c)=c$.
\textbf{Proof:}
%TCIMACRO{%
%\TeXButton{footnote}{\footnote{The proofs in this section are based on discrete probability density functions.
%These results also hold for continuous density functions. Each proof can be duplicated %for continuous density functions by replacing every summation operator with an appropriate integral.}}}% %BeginExpansion
\footnote{The proofs in this section are based on discrete probability density functions.
These results also hold for continuous density functions. Each proof can be duplicated for continuous density functions by replacing every summation operator with an appropriate integral.}% %EndExpansion
By the definition of the expected value of a function: \begin{equation*}
E(c)=\sum_{i=1}^{N}cf(x_{i}) \end{equation*}
By Property 1 of summations: \begin{equation*}
=c\sum_{i=1}^{N}f(x_{i}) \end{equation*}
Since the sum of the probabilities for all possible outcomes equals one: \begin{equation*}
=c(1)
\end{equation*}
\begin{equation*}
=c
\end{equation*}
\textbf{Property 2:} If $X$ is a random variable and $a$ is a constant, $% E(aX)=aE(X).$
\textbf{Proof:}
\begin{equation*}
E(aX)=\sum_{i=1}^{N}ax_{i}f(x_{i}) \end{equation*}
By Property 1 of summations, \begin{equation*}
=a\sum_{i=1}^{N}x_{i}f(x_{i}) \end{equation*}
Since the term in summation is, by definition, $E(X)$: \begin{equation*}
=aE(X)
\end{equation*}
\textbf{Property 3:} If $X$ is a random variable and $a$ and $b$ are constants, $E(aX+b)=aE(X)+b.$ \textbf{Proof:}
\begin{equation*}
E(aX+b)=\sum_{i=1}^{N}(ax_{i}+b)f(x_{i}) \end{equation*}
By Property 2 of summations: \begin{equation*}
=a\sum_{i=1}^{N}x_{i}f(x_{i})+b\sum_{i=1}^{N}f(x_{i}) \end{equation*}
Since the sum of the probabilities for all possible outcomes, $\sum f(x_{i})$% , equals one:
\begin{equation*}
=aE(X)+b
\end{equation*}
\textbf{Property 4:} If $X$\ and $Y$ are random variables, $E(X+Y)=E(X)+E(Y)$ \textbf{Proof:}
\begin{equation*}
E(X+Y)=\sum_{i=1}^{n}\sum_{j=1}^{m}(x_{i}+y_{j})f(x_{i},y_{j}) \end{equation*}
Using the distributive law, and Property 2 of summations: \begin{equation*}
=\left( \sum_{i=1}^{n}\sum_{j=1}^{m}x_{i}f(x_{i},y_{j})\right) +\left( \sum_{i=1}^{n}\sum_{j=1}^{m}y_{j}f(x_{i},y_{j})\right) \end{equation*}
Applying Property 4 of summations: \begin{equation}
E(X+Y)=\left( \sum_{i=1}^{n}x_{i}\sum_{j=1}^{m}f(x_{i},y_{j})\right) +\left( \sum_{j=1}^{m}y_{j}\sum_{i=1}^{n}f(x_{i},y_{j})\right) \label{sum.prop.4.az} \end{equation}
Since the marginal distributions of $X$ and $Y$ are: \begin{equation*}
f(X)=\sum_{j=1}^{m}f(x_{i},y_{j}) \end{equation*}
\begin{equation*}
f(Y)=\sum_{i=1}^{n}f(x_{i},y_{j}), \end{equation*}
Substituting these values into equation \ref{sum.prop.4.az}, the expected value of the sum of $X$ and $Y$ can be expressed as: \begin{equation*}
E(X+Y)=\sum_{i=1}^{n}x_{i}f(x_{i})+\sum_{j=1}^{m}y_{j}f(y_{j}) \end{equation*}
\begin{equation*}
=E(X)+E(Y)
\end{equation*}
\textbf{Property 5:} If $X$ and $Y$ are random variables and $a,b,$ and $c$ are constants, $E(aX+bY+c)=aE(X)+bE(Y)+c.$ \textbf{Proof:}
This follows directly from Properties 3 and 4 above.
\textbf{Property 6:} If $X$ and $Y$ are two independent random variables, $% E(XY)=E(X)E(Y)$
\textbf{Proof:}
\begin{equation*}
E(XY)=\sum_{i=1}^{n}\sum_{j=1}^{m}x_{i}y_{j}f(x_{i},y_{j}) \end{equation*}
If $X$ and $Y$ are independent, $f(x_{i},y_{j})=f(x_{i})f(y_{j})$. Thus, \begin{equation*}
E(XY)=\sum_{i=1}^{n}\sum_{j=1}^{m}x_{i}y_{j}f(x_{i})f(y_{j}) \end{equation*}
Using the associative law, this becomes: \begin{equation*}
=\sum_{i=1}^{n}\sum_{j=1}^{m}\left[ x_{i}f(x_{i})\right] \left[ y_{j}f(y_{j})% \right]
\end{equation*}
Applying Property 5 of summations: \begin{equation*}
=\sum_{i=1}^{n}\left[ x_{i}f(x_{i})\right] \sum_{j=1}^{m}\left[ y_{j}f(y_{j})% \right]
\end{equation*}
\begin{equation*}
=E(X)E(Y)
\end{equation*}
\subsection{Properties of variance} \textbf{Property 1:} The variance of a random variable, $X,$ (= $E(X-\mu _X)^2$) may also be expressed as: $E(X)^2-\mu _X^2$.
\textbf{Proof:}
\begin{equation*}
\sigma _X^2=E(X-\mu _X)^2 \end{equation*}
Squaring the term in parentheses, \begin{equation*}
=E(X^2-2\mu _XX+\mu _X^2) \end{equation*}
Using Property 5 of summations and noting that $\mu _X$ is a constant: \begin{equation*}
=E(X^2)-2\mu _XE(X)+\mu _X^2 \end{equation*}
Since $\mu _X=E(X)$, this becomes: \begin{equation*}
=E(X^2)-2\mu _X^2+\mu _X^2 \end{equation*}
\begin{equation*}
=E(X^2)-\mu _X^2
\end{equation*}
\textbf{Property 2: } If $X$ is a random variable and $a$ is a constant, $% var(aX)=a^2var(X)$ \textbf{Proof:}
\begin{equation*}
var(aX)=E\left[ aX-E(aX)\right] ^2 \end{equation*}
Using Property 2 of expectations, this can be restated as: \begin{equation*}
=E\left[ aX-aE(X)\right] ^2 \end{equation*}
\begin{equation*}
E\left[ a\left( X-E(X)\right) \right] ^2 \end{equation*}
Squaring the term in the square brackets results in: \begin{equation*}
=E\left[ a^2\left( X-E(X)\right) ^2\right] \end{equation*}
Since $a$ is a constant, Property 2 of expectations results in: \begin{equation*}
=a^2E\left( X-E(X)\right) ^2 \end{equation*}
\begin{equation*}
=a^2var(X)
\end{equation*}
\textbf{Property 3:} If $X$ is a random variable and $c$ is a constant, $% var(X+c)=var(X)$
\textbf{Proof:}
By the definition of the variance, \begin{equation*}
var(X+c)=E\left[ (X+c)-E(X+c)\right] ^2 \end{equation*}
Since $E(X+c)=\mu _X+c,$% \begin{equation*}
=E\left[ X+c-(\mu _X+c)\right] ^2 \end{equation*}
\begin{equation*}
=E(X-\mu _X)^2
\end{equation*}
\begin{equation*}
=var(X)
\end{equation*}
\subsection{Variance of sums and differences of random variables} \textbf{Property 1:} If $X$ and $Y$ are two random variables, $% var(X+Y)=var(X)+var(Y)+2cov(X,Y)$ \textbf{Proof:}
\begin{equation*}
var(X+Y)=E\left[ (X+Y)-E(X+Y)\right] ^2 \end{equation*}
Since $E(X+Y)$ = $\mu _X+\mu _Y$, \begin{equation*}
=E\left( X+Y-\mu _X-\mu _Y\right) ^2 \end{equation*}
Using the associative law of addition, \begin{equation*}
=E\left[ \left( X-\mu _X\right) +\left( Y-\mu _Y\right) \right] ^2 \end{equation*}
Squaring the terms enclosed in square brackets results in: \begin{equation*}
=E\left[ \left( X-\mu _X\right) ^2+\left( Y-\mu _Y\right) ^2+2\text{$\left( X-\mu _X\right) \left( Y-\mu _Y\right) $}\right] \end{equation*}
Applying Property 4 of expectations, \begin{equation*}
=E(X-\mu _X)^2+E(Y-\mu _Y)^2+2E\left[ (X-\mu _X)(Y-\mu _Y)\right] \end{equation*}
\begin{equation*}
=var(X)+var(Y)+2cov(X,Y) \end{equation*}
\textbf{Property 2:} If $X$ and $Y$ are two random variables, $% var(X-Y)=var(X)+var(Y)-2cov(X,Y)$ \textbf{Proof:}
\begin{equation*}
var(X-Y)=E\left[ (X-Y)-E(X-Y)\right] ^2 \end{equation*}
Since $E(X+Y)=\mu _X+\mu _Y$, \begin{equation*}
=E\left( X-Y-\mu _X+\mu _Y\right) ^2 \end{equation*}
Using the associative law of addition, \begin{equation*}
=E\left[ \left( X-\mu _X\right) -\left( Y-\mu _Y\right) \right] ^2 \end{equation*}
Squaring the terms enclosed in square brackets results in: \begin{equation*}
=E\left[ \left( X-\mu _X\right) ^2+\left( Y-\mu _Y\right) ^2-2\text{$\left( X-\mu _X\right) \left( Y-\mu _Y\right) $}\right] \end{equation*}
Applying Property 4 of expectations, \begin{equation*}
=E(X-\mu _X)^2+E(Y-\mu _Y)^2-2E\left[ (X-\mu _X)(Y-\mu _Y)\right] \end{equation*}
\begin{equation*}
=var(X)+var(Y)-2cov(X,Y) \end{equation*}
\textbf{Property 3:} If $X$ and $Y$ are independent random variables, $% var(X+Y)=var(X)+var(Y)$ \textbf{Proof:}
If $X$ and $Y$ are independent then $cov(X,Y)=0$. Substituting this result into Property 1 above provides: \begin{equation*}
var(X+Y)=var(X)+var(Y)+0 \end{equation*}
\begin{equation*}
=var(X)+var(Y)
\end{equation*}
\textbf{Property 4:} If $X_1,X_2,\ldots ,X_n$ are independent random variables, and $a_1,a_2,\ldots ,a_n$ are constants, then: \begin{equation*}
var(a_1X_1+a_2X_2+\cdots +a_nX_n)=\sum_{i=1}^na_i^2\cdot var(X_i) \end{equation*}
\textbf{Proof:}
By repeated application of Property 2 above, \begin{equation*}
var(a_1X_1+a_2X_2+\cdots +a_nX_n)=\sum_{i=1}^nvar(a_i\cdot X_i) \end{equation*}
By Property 2 of variance, the right-hand side of this expression can be rewritten as:
\begin{equation*}
var(a_1X_1+a_2X_2+\cdots +a_nX_n)=\sum_{i=1}^na_i^2\cdot var(X_i) \end{equation*}

License

Icon for the Creative Commons Attribution 4.0 International License

License

Share This Book