\[ \newcommand{\mc}[1]{\mathcal{#1}} \newcommand{\R}{\mathbb{R}} \newcommand{\E}{\mathbb{E}} \renewcommand{\P}{\mathbb{P}} \newcommand{\var}{{\rm Var}} % Variance \newcommand{\mse}{{\rm MSE}} % MSE \newcommand{\bias}{{\rm Bias}} % MSE \newcommand{\cov}{{\rm Cov}} % Covariance \newcommand{\iid}{\stackrel{\rm iid}{\sim}} \newcommand{\ind}{\stackrel{\rm ind}{\sim}} \renewcommand{\choose}[2]{\binom{#1}{#2}} % Choose \newcommand{\chooses}[2]{{}_{#1}C_{#2}} % Small choose \newcommand{\cd}{\stackrel{d}{\rightarrow}} \newcommand{\cas}{\stackrel{a.s.}{\rightarrow}} \newcommand{\cp}{\stackrel{p}{\rightarrow}} \newcommand{\bin}{{\rm Bin}} \newcommand{\ber}{{\rm Ber}} \DeclareMathOperator*{\argmax}{argmax} \DeclareMathOperator*{\argmin}{argmin} \]
There are many R
packages for fitting Bayesian regression models:
rstanarm
: pre-compiled Stan regression models; easier to use; faster; less flexiblebrms
: compile models on the fly; slower; more flexiblebayesplot
: visualization of Bayesian models
mcmc_*
)ppc_*
)ppd_*
)loo
: efficient approximate leave-one-out cross-validation for fitted Bayesian modelsshinystan
: Interactive diagnostic tools for assessing Bayesian modelsData from a survey of adult American women and their children (a subsample from the National Longitudinal Survey of Youth).
kid_score
: Child’s IQ scoremom_hs
: Indicator for whether the mother has a high school degreemom_iq
: Mother’s IQ scoremom_age
: Mother’s agestan_glm
family: gaussian [identity]
formula: kid_score ~ mom_hs + mom_iq
observations: 434
predictors: 3
------
Median MAD_SD
(Intercept) 25.8 5.5
mom_hs 6.0 2.2
mom_iq 0.6 0.1
Auxiliary parameter(s):
Median MAD_SD
sigma 18.2 0.6
------
* For help interpreting the printed output see ?print.stanreg
* For info on the priors used see ?prior_summary.stanreg
Compare observed data to draws from the posterior predictive distribution.
Compute the posterior predictive distribution for a new sample.
shinystan
LASSO = “Least Absolute Shrinkage and Selection Operator”
Model (this is how stan_glm
implements LASSO prior): \[\begin{align*}
Y \mid X, \beta, \sigma^2 & \sim N(X\beta, \sigma^2 I_n) \\
\beta_j & \iid \text{Laplace}(0, \lambda^{-1}) \\
\lambda & \sim \chi^2_{\nu}
\end{align*}\] where \(\nu\) is the degrees of freedom (the default is \(\nu = 1\)).
Bias increases as \(\lambda\) increases (producing more zeroes).
Variance decreases as \(\lambda\) increases.
Hence larger value of \(\nu\) produces more shrinkage.
data <- read.csv("dataset/car_price/CarPrice_Assignment.csv")
fit_lasso <- stan_glm(price ~ highwaympg + citympg + horsepower +
peakrpm + compressionratio,
data = data,
prior = lasso(df = 1, location = 0,
scale = 0.5, autoscale = TRUE),
prior_intercept = normal(location = 0, scale = 1,
autoscale = TRUE),
prior_aux = exponential(rate = 1, autoscale = TRUE),
chains = 4, iter = 2000, seed = 2023,
refresh = 0)
Model Info:
function: stan_glm
family: gaussian [identity]
formula: price ~ highwaympg + citympg + horsepower + peakrpm + compressionratio
algorithm: sampling
sample: 4000 (posterior sample size)
priors: see help('prior_summary')
observations: 205
predictors: 6
Estimates:
mean sd 10% 50% 90%
(Intercept) 9493.2 4656.5 3681.0 9573.9 15322.0
highwaympg -328.0 153.9 -528.2 -320.4 -140.2
citympg 71.7 171.2 -131.5 57.3 296.4
horsepower 139.0 12.4 122.9 139.2 154.7
peakrpm -1.4 0.7 -2.3 -1.4 -0.5
compressionratio 451.8 88.6 338.3 450.3 564.1
sigma 4123.4 212.5 3858.6 4113.1 4398.4
Fit Diagnostics:
mean sd 10% 50% 90%
mean_PPD 13262.4 398.9 12755.5 13256.9 13779.3
The mean_ppd is the sample average posterior predictive distribution of the outcome variable (for details see help('summary.stanreg')).
MCMC diagnostics
mcse Rhat n_eff
(Intercept) 72.0 1.0 4179
highwaympg 2.6 1.0 3374
citympg 2.9 1.0 3555
horsepower 0.2 1.0 4133
peakrpm 0.0 1.0 4251
compressionratio 1.4 1.0 4222
sigma 3.2 1.0 4291
mean_PPD 6.4 1.0 3869
log-posterior 0.1 1.0 801
For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).
stan_glm
fit1 <- stan_glm(kid_score ~ mom_hs, data=kidiq,
seed=2023, refresh=0)
fit2 <- stan_glm(kid_score ~ mom_hs + mom_iq, data=kidiq,
seed=2023, refresh=0)
## Adding random noise as predictors
set.seed(2023)
n <- nrow(kidiq)
kidiqr <- kidiq
kidiqr$noise <- array(rnorm(5*n), c(n,5))
fit3 <- stan_glm(kid_score ~ mom_hs + mom_iq + noise, data=kidiqr,
seed=2023, refresh=0)
r2_fit1 <- bayes_R2(fit1)
r2_fit2 <- bayes_R2(fit2)
r2_fit3 <- bayes_R2(fit3)
Although adding noise predictors does increase the \(R^2\), the change is negligible compared to the uncertainty of \(R^2\).