Obtain predicted values from a fitted model by providing new covariate values.

# S3 method for geostan_fit
predict(
  object,
  newdata,
  alpha = mean(as.matrix(object, pars = "intercept")),
  center = object$x_center,
  summary = TRUE,
  type = c("link", "response"),
  ...
)

Arguments

object

A fitted model object of class geostan_fit.

newdata

A data frame in which to look for variables with which to predict, presumably for the purpose of viewing marginal effects. Note that if the model formula includes an offset term, newdata must contain the offset. Note also that any spatially-lagged covariate terms will be ignored if they were provided using the slx argument. If covariates in the model were centered using the centerx argument, the predict.geostan_fit method will automatically center the predictors in newdata using the values stored in object$x_center. If newdata is missing, the fitted values of the model will be returned.

alpha

A single numeric value or a numeric vector with length equal to nrow(newdata); alpha serves as the intercept in the linear predictor. The default is to use the posterior mean of the intercept. Even if type = "response", this needs to be provided on the scale of the linear predictor.

center

May be a vector of numeric values or a logical scalar to pass to scale. Defaults to using object$x_center. If the model was fit using centerx = TRUE, then covariates were centered and their mean values are stored in object$x_center and the predict method will use them to automatically center newdata; if the model was fit with centerx = FALSE, then object$x_center = FALSE and newdata will not be centered.

summary

Logical; should the values be summarized with the mean, standard deviation and quantiles (probs = c(.025, .2, .5, .8, .975)) for each observation? Otherwise a matrix containing samples from the posterior distribution at each observation is returned.

type

By default, results from predict are on the scale of the linear predictor (type = "link")). The alternative (type = "response") is on the scale of the response variable. For example, the default return values for a Poisson model on the log scale, and using type = "response" will return the original scale of the outcome variable (by exponentiating the log values).

...

Not used

Value

If summary = FALSE, a matrix of samples is returned. If summary = TRUE (the default), a data frame is returned.

Details

The purpose of the predict method is to explore marginal effects of (combinations of) covariates. The method sets the intercept equal to its posterior mean (i.e., alpha = mean(as.matrix(object, pars = "intercept"))); the only source of uncertainty in the results is the posterior distribution of the coefficients, which can be obtained using Beta = as.matrix(object, pars = "beta").

The model formula will be taken from object$formula, and then a model matrix will be created by passing newdata to the model.frame function (as in: model.frame(newdata, object$formula).

Be aware that in generalized linear models (such as Poisson and Binomial models) marginal effects of each covariate are sensitive to the level of other covariates in the model. If the model includes any spatially-lagged covariates (introduced using the slx argument) or a spatial autocorrelation term (for example, you used a spatial CAR, SAR, or ESF model), these terms will essentially be fixed at zero for the purposes of calculating marginal effects. If you want to change this, you can introduce spatial trend values by specifying a varying intercept using the alpha argument.

Examples

data(georgia)
georgia$income <- georgia$income / 1e3

fit <- stan_glm(deaths.male ~ offset(log(pop.at.risk.male)) + log(income),
               data = georgia,
               centerx = TRUE,
               family = poisson(),
               chains = 2, iter = 600) # for speed only

# note: pop.at.risk.male=1 leads to log(pop.at.risk.male)=0
# so that the predicted values are rates
newdata <- data.frame(
             income = seq(min(georgia$income),
                          max(georgia$income),
                           length.out = 100),
             pop.at.risk.male = 1)

preds <- predict(fit, newdata, type = "response")
head(preds)
plot(preds$income,
     preds$mean * 10e3,
     type = "l",
     ylab = "Deaths per 10,000",
     xlab = "Income ($1,000s)")

# here the predictions are rates per 10,000
newdata$pop.at.risk.male <- 10e3
preds <- predict(fit, newdata, type = "response")
head(preds)
plot(preds$income,
    preds$mean,
    type = "l",
    ylab = "Deaths per 10,000",
    xlab = "Income ($1,000s)")