Recall that an exponential family is a family of distributions \[
f(x \mid \theta) = h(x)\exp(\theta^T T(x) - \psi(\theta))
\] where \(\theta \in \R^k\) and \(T(x) = [T_1(x), \ldots, T_k(x)]^T\).
Two useful properties (from Bartlett’s identities):
Let \(Y\) be univariate, \(X \in \R^p\), and \(\beta \in \R^p\).
A GLM is assuming \(Y \mid X, \beta \sim F_{\theta}\), where \(\theta = X^T\beta\) and \(F_\theta\) has the density function \[
f(y \mid \theta) = h(y)\exp(\theta\cdot y - \psi(\theta)).
\]
Putting altogether, we have \[
g(\E(Y\mid X, \beta)) = \log\E(Y \mid X, \beta) = X^T\beta
\] or equivalently \[
\E(Y \mid X, \beta) = \exp(X^T\beta),
\] aka Poisson log-linear regression.
Remarks
The link function \(g = (\psi^{\prime})^{-1}\) is sometimes called the canonical link function, since it is derived from the canonical representation of an exponential family.
All we need for a link function is that it matches the domain of \(\E(Y \mid X, \beta)\) and \(X^T\beta\).
For example, in the Bernoulli linear model, we could have used the probit link function \[
g(u) = \Phi^{-1}(u): [0, 1] \to \R
\] where \(\Phi\) is the CDF of the standard normal distribution.
This is called the probit regression.
Over- and underdispersion
In the normal linear model, the conditional mean and variance of \(Y \mid X\) are modeled by two different parameters \(\beta\) and \(\sigma^2\).
However, in Poisson linear model, \[
\E(Y \mid X, \beta) = \exp(X^T\beta) = \var(Y \mid X, \beta).
\]
Also, in Bernoulli linear model, \[
\var(Y \mid X, \beta) = \E(Y\mid X, \beta)(1-\E(Y \mid X, \beta)) = \frac{\exp(X^T\beta)}{(1+\exp(X^T\beta))^2}.
\]
That is, the variance is determined by the mean, which might not be a reasonable assumption in practice.
When the observed variance is smaller or larger than the assumed variance, it is called an underdispersion or overdispersion, respectively.
Exponential Dispersion Family
Let \(\theta \in \R\) and \(\phi \in \R\).
The exponential dispersion family is of the form \[
f(x \mid \theta, \phi) = \exp\left(\frac{x\theta - b(\theta)}{\phi} + c(x, \phi)\right).
\]
The parameter \(\phi\) is called the dispersion parameter.
Letting \(\mu = \E(X) = b^{\prime}(\theta)\), we have \(\var(X) = \phi b^{\prime\prime}\left((b^{\prime})^{-1}(\mu)\right) = \phi \mc{V}(\mu)\), where \(\mc{V}(\mu) = b^{\prime\prime}\left((b^{\prime})^{-1}(\mu)\right)\) is called the variance function.
Can we generalize the Poisson/Bernoulli family to include a dispersion parameter?1
What can we do if we observe over-/underdispersion in a Poisson/Bernoulli linear model?
In R, use family=quasi* which allows you to specify the variance function, e.g., quasipoisson or quasibinomial.
Or you can use a negative binomial model for count data (you can easily show that the scaled negative binomial distribution is an exponential dispersion family).
Example: Recreation Demand
Example
library(AER)library(MASS)library(gtsummary)data("RecreationDemand")fit_pois <-glm(trips ~ ., data = RecreationDemand, family = poisson)dt <-dispersiontest(fit_pois)print(paste("Dispersion Test: p-value =", round(dt$p.value,4)))