Lecture 10: Generalized Linear Model and Latent Variable Model

Chun-Hao Yang

Generalized Linear Model

Exponential family

Recall that an exponential family is a family of distributions \[ f(x \mid \theta) = h(x)\exp(\theta^T T(x) - \psi(\theta)) \] where \(\theta \in \R^k\) and \(T(x) = [T_1(x), \ldots, T_k(x)]^T\).
Two useful properties (from Bartlett’s identities):
- \(\E(T(X)) = \nabla\psi(\theta)\)
- \(\var(T(X)) = \text{Hess}(\psi(\theta)) = \nabla(\nabla \psi(\theta))\).
That is, the relationship between the parameter \(\theta\) and the expectation \(\E(T(X))\) determined by \(\nabla \psi\).

Examples

Normal: \(f(x \mid \mu, \sigma^2) = \exp\left(-\frac{1}{2\sigma^2} x^2 + \frac{\mu}{\sigma^2}x - \frac{\mu^2}{2\sigma^2}\right)\), \(x \in \R\)
- \(\theta = \left(-\frac{1}{2\sigma^2}, \frac{\mu}{\sigma^2}\right)\), \(T(x) = (-x^2, x)\), \(\psi(\theta) = -\frac{\theta_2^2}{4\theta_1} = \frac{\mu^2}{2\sigma^2}\)
Bernoulli: \(f(x \mid p) = p^x(1-p)^{1-x} = \exp\left(x\log\frac{p}{1-p} + \log(1-p)\right)\), \(x \in \{0, 1\}\)
- \(\theta = \log\frac{p}{1-p}\), \(T(x) = x\), \(\psi(\theta) = -\log(1-p) = \log(1 + e^{\theta})\)
Poisson: \(f(x \mid \lambda) = \frac{1}{x!}\exp(x\log\lambda - \lambda)\), \(x = 0, 1, 2, \ldots\)
- \(\theta = \log\lambda\), \(T(x) = x\), \(\psi(\theta) = \exp(\theta) = \lambda\)

Generalized Linear Model (GLM)

Let \(Y\) be univariate, \(X \in \R^p\), and \(\beta \in \R^p\).
A GLM is assuming \(Y \mid X, \beta \sim F_{\theta}\), where \(\theta = X^T\beta\) and \(F_\theta\) has the density function \[ f(y \mid \theta) = h(y)\exp(\theta\cdot y - \psi(\theta)). \]
Therefore, \[\begin{align*} \E(Y \mid X, \beta) & = \frac{d}{d\theta}\psi(\theta) = \psi^{\prime}(X^T\beta) \end{align*}\]
Equivalently, \[ g(\E(Y \mid X, \beta)) = X^T\beta \] where \(g\) is the inverse of \(\psi^{\prime}\).
The function \(g\) is called the link function.

Bernoulli linear model

Let \(Y \mid X, \beta \sim \text{Ber}(p)\).
We have \(\theta = \log\frac{p}{1-p}\), \(T(x) = x\), \(\psi(\theta) = -\log(1-p) = \log(1 + e^{\theta})\)
Thus, \(\psi^{\prime}(\theta) = \frac{e^{\theta}}{1 + e^{\theta}}\) and \(g(u) = (\psi^{\prime})^{-1}(u) = \log\frac{u}{1-u}\).
\(\phi^{\prime}\) is called the logistic function; \(g\) is called the logit function.
Putting altogether, we have \[ g(\E(Y\mid X, \beta)) = \log\frac{\E(Y \mid X, \beta)}{1 - \E(Y \mid X, \beta)} = X^T\beta \] or equivalently \[ \E(Y \mid X, \beta) = \psi^{\prime}(X^T\beta) = \frac{\exp(X^T\beta)}{1 + \exp(X^T\beta)}, \] aka logistic regression.

Poisson linear model

Let \(Y \mid X, \beta \sim \text{Pois}(\lambda)\).
We have \(\theta = \log\lambda\), \(T(x) = x\), \(\psi(\theta) = \exp(\theta) = \lambda\).
Thus \(\psi^{\prime}(\theta) = \exp(\theta)\) and \(g(u) = (\psi^{\prime})^{-1}(u) = \log u\).
Putting altogether, we have \[ g(\E(Y\mid X, \beta)) = \log\E(Y \mid X, \beta) = X^T\beta \] or equivalently \[ \E(Y \mid X, \beta) = \exp(X^T\beta), \] aka Poisson log-linear regression.

Remarks

The link function \(g = (\psi^{\prime})^{-1}\) is sometimes called the canonical link function, since it is derived from the canonical representation of an exponential family.
All we need for a link function is that it matches the domain of \(\E(Y \mid X, \beta)\) and \(X^T\beta\).
For example, in the Bernoulli linear model, we could have used the probit link function \[ g(u) = \Phi^{-1}(u): [0, 1] \to \R \] where \(\Phi\) is the CDF of the standard normal distribution.
This is called the probit regression.

Over- and underdispersion

In the normal linear model, the conditional mean and variance of \(Y \mid X\) are modeled by two different parameters \(\beta\) and \(\sigma^2\).
However, in Poisson linear model, \[ \E(Y \mid X, \beta) = \exp(X^T\beta) = \var(Y \mid X, \beta). \]
Also, in Bernoulli linear model, \[ \var(Y \mid X, \beta) = \E(Y\mid X, \beta)(1-\E(Y \mid X, \beta)) = \frac{\exp(X^T\beta)}{(1+\exp(X^T\beta))^2}. \]
That is, the variance is determined by the mean, which might not be a reasonable assumption in practice.
When the observed variance is smaller or larger than the assumed variance, it is called an underdispersion or overdispersion, respectively.

Exponential Dispersion Family

Let \(\theta \in \R\) and \(\phi \in \R\).
The exponential dispersion family is of the form \[ f(x \mid \theta, \phi) = \exp\left(\frac{x\theta - b(\theta)}{\phi} + c(x, \phi)\right). \]
Similarly from Bartlett’s identities¹, \[ \E(X) = b^{\prime}(\theta) \quad \text{and} \quad \var(X) = \phi b^{\prime\prime}(\theta). \]
The parameter \(\phi\) is called the dispersion parameter.
Letting \(\mu = \E(X) = b^{\prime}(\theta)\), we have \(\var(X) = \phi b^{\prime\prime}\left((b^{\prime})^{-1}(\mu)\right) = \phi \mc{V}(\mu)\), where \(\mc{V}(\mu) = b^{\prime\prime}\left((b^{\prime})^{-1}(\mu)\right)\) is called the variance function.

Example

Normal: \(\theta = \mu\), \(\phi = \sigma^2\), \(b(\theta) = \frac{\theta^2}{2}\), \(c(x, \phi) = -\frac{x^2}{2\phi} - \frac{1}{2}\log(2\pi\phi)\).
Bernoulli: \(\theta = \log\frac{p}{1-p}\), \(\phi = 1\), \(b(\theta) = \log(1 + e^{\theta})\), \(c(x, \phi) = 0\).
Poisson: \(\theta = \log\lambda\), \(\phi = 1\), \(b(\theta) = e^{\theta}\), \(c(x, \phi) = -\log(x!)\).
Can we generalize the Poisson/Bernoulli family to include a dispersion parameter?¹
What can we do if we observe over-/underdispersion in a Poisson/Bernoulli linear model?
- In R, use family=quasi* which allows you to specify the variance function, e.g., quasipoisson or quasibinomial.
- Or you can use a negative binomial model for count data (you can easily show that the scaled negative binomial distribution is an exponential dispersion family).

Example: Recreation Demand

Example

library(AER)
library(MASS)
library(gtsummary)
data("RecreationDemand")
fit_pois <- glm(trips ~ ., data = RecreationDemand, family = poisson)

dt <- dispersiontest(fit_pois)

print(paste("Dispersion Test: p-value =", round(dt$p.value,4)))

[1] "Dispersion Test: p-value = 0.0079"

fit_nb <- glm.nb(trips ~ ., data = RecreationDemand)

t1 <- tbl_regression(fit_pois, exponentiate = TRUE)
t2 <- tbl_regression(fit_nb, exponentiate = TRUE)

t_all <- tbl_merge(tbls = list(t1, t2), 
                   tab_spanner = c("**Poisson**", "**Neg-Bin**"))

Example

Characteristic	Poisson			Neg-Bin
Characteristic	IRR¹	95% CI¹	p-value	IRR¹	95% CI¹	p-value
quality	1.60	1.55, 1.66	<0.001	2.06	1.89, 2.25	<0.001
ski
no	—	—		—	—
yes	1.52	1.36, 1.70	<0.001	1.84	1.38, 2.48	<0.001
income	0.89	0.86, 0.93	<0.001	0.97	0.89, 1.06	0.5
userfee
no	—	—		—	—
yes	2.46	2.10, 2.86	<0.001	1.95	1.01, 4.22	0.058
costC	1.00	0.99, 1.00	0.3	1.05	1.02, 1.08	<0.001
costS	0.96	0.96, 0.96	<0.001	0.91	0.90, 0.93	<0.001
costH	1.04	1.03, 1.04	<0.001	1.04	1.02, 1.06	<0.001
¹ IRR = Incidence Rate Ratio, CI = Confidence Interval