Obtain predicted values from a fitted model by providing new covariate values.
A fitted model object of class geostan_fit
.
A data frame in which to look for variables with which to predict, presumably for the purpose of viewing marginal effects. Note that if the model formula includes an offset term, newdata
must contain the offset. Note also that any spatially-lagged covariate terms will be ignored if they were provided using the slx
argument. If covariates in the model were centered using the centerx
argument, the predict.geostan_fit
method will automatically center the predictors in newdata
using the values stored in object$x_center
. If newdata
is missing, the fitted values of the model will be returned.
A single numeric value or a numeric vector with length equal to nrow(newdata)
; alpha
serves as the intercept in the linear predictor. The default is to use the posterior mean of the intercept. Even if type = "response"
, this needs to be provided on the scale of the linear predictor.
May be a vector of numeric values or a logical scalar to pass to scale
. Defaults to using object$x_center
. If the model was fit using centerx = TRUE
, then covariates were centered and their mean values are stored in object$x_center
and the predict
method will use them to automatically center newdata
; if the model was fit with centerx = FALSE
, then object$x_center = FALSE
and newdata
will not be centered.
Logical; should the values be summarized with the mean, standard deviation and quantiles (probs = c(.025, .2, .5, .8, .975)
) for each observation? Otherwise a matrix containing samples from the posterior distribution at each observation is returned.
By default, results from predict
are on the scale of the linear predictor (type = "link")
). The alternative (type = "response"
) is on the scale of the response variable. For example, the default return values for a Poisson model on the log scale, and using type = "response"
will return the original scale of the outcome variable (by exponentiating the log values).
Not used
If summary = FALSE
, a matrix of samples is returned. If summary = TRUE
(the default), a data frame is returned.
The purpose of the predict method is to explore marginal effects of (combinations of) covariates. The method sets the intercept equal to its posterior mean (i.e., alpha = mean(as.matrix(object, pars = "intercept"))
); the only source of uncertainty in the results is the posterior distribution of the coefficients, which can be obtained using Beta = as.matrix(object, pars = "beta")
.
The model formula will be taken from object$formula
, and then a model matrix will be created by passing newdata
to the model.frame function (as in: model.frame(newdata, object$formula
).
Be aware that in generalized linear models (such as Poisson and Binomial models) marginal effects of each covariate are sensitive to the level of other covariates in the model. If the model includes any spatially-lagged covariates (introduced using the slx
argument) or a spatial autocorrelation term (for example, you used a spatial CAR, SAR, or ESF model), these terms will essentially be fixed at zero for the purposes of calculating marginal effects. If you want to change this, you can introduce spatial trend values by specifying a varying intercept using the alpha
argument.
data(georgia)
georgia$income <- georgia$income / 1e3
fit <- stan_glm(deaths.male ~ offset(log(pop.at.risk.male)) + log(income),
data = georgia,
centerx = TRUE,
family = poisson(),
chains = 2, iter = 600) # for speed only
# note: pop.at.risk.male=1 leads to log(pop.at.risk.male)=0
# so that the predicted values are rates
newdata <- data.frame(
income = seq(min(georgia$income),
max(georgia$income),
length.out = 100),
pop.at.risk.male = 1)
preds <- predict(fit, newdata, type = "response")
head(preds)
plot(preds$income,
preds$mean * 10e3,
type = "l",
ylab = "Deaths per 10,000",
xlab = "Income ($1,000s)")
# here the predictions are rates per 10,000
newdata$pop.at.risk.male <- 10e3
preds <- predict(fit, newdata, type = "response")
head(preds)
plot(preds$income,
preds$mean,
type = "l",
ylab = "Deaths per 10,000",
xlab = "Income ($1,000s)")