Sample from the posterior predictive distribution — posterior

Draw samples from the posterior predictive distribution of a fitted geostan model.

posterior_predict(
  object,
  S,
  summary = FALSE,
  width = 0.95,
  approx = TRUE,
  K = 20,
  preserve_order = FALSE,
  seed
)

Source

LeSage, James, & Robert kelley Pace (2009). Introduction to Spatial Econometrics. Chapman and Hall/CRC.

Gelman, A., J. B.Carlin, H. S. Stern, D. B. Dunson, A. Vehtari, & D. B. Rubin, D. B. (2014). Bayesian data analysis (3rd ed.). CRC Press.

McElreath, Richard (2016). Statistical Rethinking: A Bayesian Course with Examples in R and Stan. CRC Press, Ch. 3.

Arguments

object: A geostan_fit object.
S: Optional; number of samples to take from the posterior distribution. The default, and maximum, is the total number of samples stored in the model.
summary: Should the predictive distribution be summarized by its means and central quantile intervals? If summary = FALSE, an S x N matrix of samples will be returned. If summary = TRUE, then a data.frame with the means and 100*width credible intervals is returned.
width: Only used if summary = TRUE, to set the quantiles for the credible intervals. Defaults to width = 0.95.
approx: For SAR models only; approx = TRUE uses an approximation method for the inverse of matrix (I - rho * W).
K: For SAR models only; number of matrix powers to for the matrix inverse approximation (used when approx = TRUE). High values of rho (especially > 0.9) require larger K for accurate approximation.
preserve_order: If TRUE, the order of posterior draws will remain fixed; the default is to permute the MCMC samples so that (with small sample size S) each successive call to posterior_predict will return a different sample from the posterior probability distribution.
seed: A single integer value to be used in a call to set.seed before taking samples from the posterior distribution.

Value

A matrix of size S x N containing samples from the posterior predictive distribution, where S is the number of samples drawn and N is the number of observations. If summary = TRUE, a data.frame with N rows and 3 columns is returned (with column names mu, lwr, and upr).

Details

This method returns samples from the posterior predictive distribution of the model (at the observed values of covariates, etc.). The predictions incorporate uncertainty of all parameter values (used to calculate the expected value of the model, for example) plus the error term (the model's description of the amount of variability of observations around the expected value). If the model includes measurement error in the covariates, this source of uncertainty (about $X$) is passed into the posterior predictive distribution as well.

For SAR models (and all other models), the observed outcomes are not used to formulate the posterior predictive distribution. The posterior predictive distribution for the SLM (see stan_sar) is given by $$(I - \rho W)^{-1} (\mu + \epsilon).$$ The SDLM is the same but includes spatially-lagged covariates in $mu$. The approx = FALSE method for SAR models requires a call to Matrix::solve(I - rho * W) for each MCMC sample; the approx = TRUE method uses an approximation based on matrix powers (LeSage and Pace 2009). The approximation will deteriorate if $\rho^K$ is not near zero, so use with care.

Examples

E <- sentencing$expected_sents
sentencing$log_E <- log(E)
 fit <- stan_glm(sents ~ offset(log_E),
                  re = ~ name,
                  data = sentencing,
                  family = poisson(),
                  chains = 2, iter = 600) # for speed only


 yrep <- posterior_predict(fit, S = 65)
 plot(density(yrep[1,] / E ))
 for (i in 2:nrow(yrep)) lines(density(yrep[i,] / E), col = 'gray30')
 lines(density(sentencing$sents / E), col = 'darkred', lwd = 2)

sars <- prep_sar_data2(row = 9, col = 9)
W <- sars$W
y <- sim_sar(rho = .9, w = W)
fit <- stan_sar(y ~ 1, data = data.frame(y=y), sar = sars,
                iter = 650, quiet = TRUE)
yrep <- posterior_predict(fit, S = 15)