Simple Linear Regression

Let’s continue with the cars data, but this time let’s formally recognize that a car’s weight might have some effect on city MPG. Assume

We again focus on the expected value, not $\sigma$, but we will start dropping the $\mathbb{E}(Y)$ notation because it quickly becomes cumbersome and needlessly repetitive. Notice that this model implicitly states that the expected city MPG depends linearly on a car’s weight.

The following code reads in the dataset, plots the $\texttt{mpgCity}$ variable against the $\texttt{weight}$ data, and calculates an estimate of the population mean dependent on the sampled cars’ weights. Here the estimated expected city MPG is formed by a linear combination of $\hat{\beta}_0, \hat{\beta}_1$ and $\texttt{weight}$.

import numpy as np
import pandas as pd
import bplot as bp
from scipy.optimize import minimize
from scipy.stats import norm as normal
import patsy


cars = pd.read_csv("")

bp.scatter(cars['weight'], cars['mpgCity'])

<matplotlib.collections.PathCollection at 0x124a35b70>


def ll(beta, yX):
    y = yX[:, 0]
    X = yX[:, -2:]
    yhat = np.full(y.shape, np.nan)
    for r in range(X.shape[0]):
        yhat[r] = np.sum(beta * X[r,:])
    d = y - yhat
    return np.sum(d * d)

pX = patsy.dmatrix("~ weight", data=cars)
yX = np.c_[cars['mpgCity'].values, np.asarray(pX)]

beta_hat = minimize(ll, normal.rvs(size=2), args=(yX))['x']

We write the estimated linear model as

What’s nice about this model is that we can easily interpret the parameters $\hat{\beta}_0 = 50.14$ and $\hat{\beta}_1 = -0.01$. For instance, $\hat{\beta}_0$, better known as an intercept, is an estimate of (the population) city MPG for a car that weighs $0$ pounds. While this doesn’t make sense conceptually, this is the literal interpretation of the estimated intercept in the context of these data. Notice that there’s not data near $\text{weight} = 0$. When this is the case, you can generally expect the intercept to not make much sense.

The slope, $\hat{\beta}_1$, describes the linear relationship between a car’s weight and city MPG. This estimate too is relatively easy to interpret in the context of these data. For every 1 unit increase in a car’s weight (a 1 pound increase), we estimate a $0.01$ decrease in the expected city MPG.

Be careful to not overly interpret these estimates as describing a causal relationship. Determining causal relationships from non-experimental data is no easy task, and we won’t even try to broach this topic in this course. Peter Norvig, Director of Research at Google, wrote an essay describing many of the difficulties of applied statistics practice surrounding observational data. His essay Warning Signs in Experimental Design and Interpretation enumerates common warning signs for when a practioner of applied statistics might be misinterpreting their data. For a more theoretical approach to determining causal relationships from data, see Judea Pearl’s book Causality.

Just like before, the estimates $\hat{\beta}_0, \hat{\beta}_1$ are simply one set of estimates based on one random sample. The values we produced could be due to pure random chance. To better understand our uncertainty in these estimates, we will calculate confidence intervals. We will let the function $\texttt{boot::boot}$ do the random sampling for us, and for that we need write a function that accepts our data and a vector of indices.

N = cars['mpgCity'].size
R = 999
betas = np.full((R, 2), np.nan)

for r in range(R):
    idx = np.random.choice(N, N)
    betas[r, :] = minimize(ll, normal.rvs(size=2), args=(yX[idx, :]))['x']

beta_p = np.percentile(betas, [10, 90], axis=0)

axs = bp.subplots(1, 2)

bp.density(betas[:, 0])
bp.percentile_h(betas[:, 0], y=0)
bp.rug(beta_p[:, 0])
bp.labels(x='Intercept', y='Density')

bp.density(betas[:, 1])
bp.percentile_h(betas[:, 1], y=0)
bp.rug(beta_p[:, 1])
bp.labels(x='Slope', y='Density')


<matplotlib.axes._subplots.AxesSubplot at 0x124fb9710>


The $90\%$ confidence intervals for $\beta_0$ and $\beta_1$ are

np.round(beta_p, 3)

array([[ 4.6007e+01, -1.0000e-02],
       [ 5.3851e+01, -8.0000e-03]])

Some classes will make a big deal about the slope estimate not including $0$. There’s no doubt such conclusions have some appeal. However, this too often encourages binary thinking such as, “is the true population slope equal to zero or isn’t it?” When a statistic, known as p-value, is smaller than $0.05$ or a confidence interval excludes zero, the common phrase is, statistically significantly different from zero, as if zero or not are the only options.

Increasingly, statisticians are warning against such binary decision making; e.g. “It’s time to talk about ditching statistical significance”, “Moving to a World Beyond ‘p < 0.05’”, or “Scientists rise up against statistical significance”.

In this class, we’ll focus on predictions, understanding uncertainty in our predictions, and making decisions in the face of this uncertainty.