Lecture 10: Generalized Linear Model and Latent Variable Model

Chun-Hao Yang

Generalized Linear Model

Exponential family

  • Recall that an exponential family is a family of distributions \[ f(x \mid \theta) = h(x)\exp(\theta^T T(x) - \psi(\theta)) \] where \(\theta \in \R^k\) and \(T(x) = [T_1(x), \ldots, T_k(x)]^T\).
  • Two useful properties (from Bartlett’s identities):
    • \(\E(T(X)) = \nabla\psi(\theta)\)
    • \(\var(T(X)) = \text{Hess}(\psi(\theta)) = \nabla(\nabla \psi(\theta))\).
  • That is, the relationship between the parameter \(\theta\) and the expectation \(\E(T(X))\) determined by \(\nabla \psi\).

Examples

  • Normal: \(f(x \mid \mu, \sigma^2) = \exp\left(-\frac{1}{2\sigma^2} x^2 + \frac{\mu}{\sigma^2}x - \frac{\mu^2}{2\sigma^2}\right)\), \(x \in \R\)
    • \(\theta = \left(-\frac{1}{2\sigma^2}, \frac{\mu}{\sigma^2}\right)\), \(T(x) = (-x^2, x)\), \(\psi(\theta) = -\frac{\theta_2^2}{4\theta_1} = \frac{\mu^2}{2\sigma^2}\)
  • Bernoulli: \(f(x \mid p) = p^x(1-p)^{1-x} = \exp\left(x\log\frac{p}{1-p} + \log(1-p)\right)\), \(x \in \{0, 1\}\)
    • \(\theta = \log\frac{p}{1-p}\), \(T(x) = x\), \(\psi(\theta) = -\log(1-p) = \log(1 + e^{\theta})\)
  • Poisson: \(f(x \mid \lambda) = \frac{1}{x!}\exp(x\log\lambda - \lambda)\), \(x = 0, 1, 2, \ldots\)
    • \(\theta = \log\lambda\), \(T(x) = x\), \(\psi(\theta) = \exp(\theta) = \lambda\)

Generalized Linear Model (GLM)

  • Let \(Y\) be univariate, \(X \in \R^p\), and \(\beta \in \R^p\).
  • A GLM is assuming \(Y \mid X, \beta \sim F_{\theta}\), where \(\theta = X^T\beta\) and \(F_\theta\) has the density function \[ f(y \mid \theta) = h(y)\exp(\theta\cdot y - \psi(\theta)). \]
  • Therefore, \[\begin{align*} \E(Y \mid X, \beta) & = \frac{d}{d\theta}\psi(\theta) = \psi^{\prime}(X^T\beta) \end{align*}\]
  • Equivalently, \[ g(\E(Y \mid X, \beta)) = X^T\beta \] where \(g\) is the inverse of \(\psi^{\prime}\).
  • The function \(g\) is called the link function.

Bernoulli linear model

  • Let \(Y \mid X, \beta \sim \text{Ber}(p)\).
  • We have \(\theta = \log\frac{p}{1-p}\), \(T(x) = x\), \(\psi(\theta) = -\log(1-p) = \log(1 + e^{\theta})\)
  • Thus, \(\psi^{\prime}(\theta) = \frac{e^{\theta}}{1 + e^{\theta}}\) and \(g(u) = (\psi^{\prime})^{-1}(u) = \log\frac{u}{1-u}\).
  • \(\phi^{\prime}\) is called the logistic function; \(g\) is called the logit function.
  • Putting altogether, we have \[ g(\E(Y\mid X, \beta)) = \log\frac{\E(Y \mid X, \beta)}{1 - \E(Y \mid X, \beta)} = X^T\beta \] or equivalently \[ \E(Y \mid X, \beta) = \psi^{\prime}(X^T\beta) = \frac{\exp(X^T\beta)}{1 + \exp(X^T\beta)}, \] aka logistic regression.

Poisson linear model

  • Let \(Y \mid X, \beta \sim \text{Pois}(\lambda)\).
  • We have \(\theta = \log\lambda\), \(T(x) = x\), \(\psi(\theta) = \exp(\theta) = \lambda\).
  • Thus \(\psi^{\prime}(\theta) = \exp(\theta)\) and \(g(u) = (\psi^{\prime})^{-1}(u) = \log u\).
  • Putting altogether, we have \[ g(\E(Y\mid X, \beta)) = \log\E(Y \mid X, \beta) = X^T\beta \] or equivalently \[ \E(Y \mid X, \beta) = \exp(X^T\beta), \] aka Poisson log-linear regression.

Remarks

  • The link function \(g = (\psi^{\prime})^{-1}\) is sometimes called the canonical link function, since it is derived from the canonical representation of an exponential family.
  • All we need for a link function is that it matches the domain of \(\E(Y \mid X, \beta)\) and \(X^T\beta\).
  • For example, in the Bernoulli linear model, we could have used the probit link function \[ g(u) = \Phi^{-1}(u): [0, 1] \to \R \] where \(\Phi\) is the CDF of the standard normal distribution.
  • This is called the probit regression.

Over- and underdispersion

  • In the normal linear model, the conditional mean and variance of \(Y \mid X\) are modeled by two different parameters \(\beta\) and \(\sigma^2\).
  • However, in Poisson linear model, \[ \E(Y \mid X, \beta) = \exp(X^T\beta) = \var(Y \mid X, \beta). \]
  • Also, in Bernoulli linear model, \[ \var(Y \mid X, \beta) = \E(Y\mid X, \beta)(1-\E(Y \mid X, \beta)) = \frac{\exp(X^T\beta)}{(1+\exp(X^T\beta))^2}. \]
  • That is, the variance is determined by the mean, which might not be a reasonable assumption in practice.
  • When the observed variance is smaller or larger than the assumed variance, it is called an underdispersion or overdispersion, respectively.

Exponential Dispersion Family

  • Let \(\theta \in \R\) and \(\phi \in \R\).
  • The exponential dispersion family is of the form \[ f(x \mid \theta, \phi) = \exp\left(\frac{x\theta - b(\theta)}{\phi} + c(x, \phi)\right). \]
  • Similarly from Bartlett’s identities1, \[ \E(X) = b^{\prime}(\theta) \quad \text{and} \quad \var(X) = \phi b^{\prime\prime}(\theta). \]
  • The parameter \(\phi\) is called the dispersion parameter.
  • Letting \(\mu = \E(X) = b^{\prime}(\theta)\), we have \(\var(X) = \phi b^{\prime\prime}\left((b^{\prime})^{-1}(\mu)\right) = \phi \mc{V}(\mu)\), where \(\mc{V}(\mu) = b^{\prime\prime}\left((b^{\prime})^{-1}(\mu)\right)\) is called the variance function.

Example

  • Normal: \(\theta = \mu\), \(\phi = \sigma^2\), \(b(\theta) = \frac{\theta^2}{2}\), \(c(x, \phi) = -\frac{x^2}{2\phi} - \frac{1}{2}\log(2\pi\phi)\).

  • Bernoulli: \(\theta = \log\frac{p}{1-p}\), \(\phi = 1\), \(b(\theta) = \log(1 + e^{\theta})\), \(c(x, \phi) = 0\).

  • Poisson: \(\theta = \log\lambda\), \(\phi = 1\), \(b(\theta) = e^{\theta}\), \(c(x, \phi) = -\log(x!)\).

  • Can we generalize the Poisson/Bernoulli family to include a dispersion parameter?1

  • What can we do if we observe over-/underdispersion in a Poisson/Bernoulli linear model?

    • In R, use family=quasi* which allows you to specify the variance function, e.g., quasipoisson or quasibinomial.
    • Or you can use a negative binomial model for count data (you can easily show that the scaled negative binomial distribution is an exponential dispersion family).

Example: Recreation Demand

Example

library(AER)
library(MASS)
library(gtsummary)
data("RecreationDemand")
fit_pois <- glm(trips ~ ., data = RecreationDemand, family = poisson)

dt <- dispersiontest(fit_pois)

print(paste("Dispersion Test: p-value =", round(dt$p.value,4)))
[1] "Dispersion Test: p-value = 0.0079"
fit_nb <- glm.nb(trips ~ ., data = RecreationDemand)

t1 <- tbl_regression(fit_pois, exponentiate = TRUE)
t2 <- tbl_regression(fit_nb, exponentiate = TRUE)

t_all <- tbl_merge(tbls = list(t1, t2), 
                   tab_spanner = c("**Poisson**", "**Neg-Bin**"))

Example

Characteristic Poisson Neg-Bin
IRR1 95% CI1 p-value IRR1 95% CI1 p-value
quality 1.60 1.55, 1.66 <0.001 2.06 1.89, 2.25 <0.001
ski
    no
    yes 1.52 1.36, 1.70 <0.001 1.84 1.38, 2.48 <0.001
income 0.89 0.86, 0.93 <0.001 0.97 0.89, 1.06 0.5
userfee
    no
    yes 2.46 2.10, 2.86 <0.001 1.95 1.01, 4.22 0.058
costC 1.00 0.99, 1.00 0.3 1.05 1.02, 1.08 <0.001
costS 0.96 0.96, 0.96 <0.001 0.91 0.90, 0.93 <0.001
costH 1.04 1.03, 1.04 <0.001 1.04 1.02, 1.06 <0.001
1 IRR = Incidence Rate Ratio, CI = Confidence Interval