Use the CAR model as a prior on parameters, or fit data to a spatial Gaussian CAR model.

```
stan_car(
formula,
slx,
re,
data,
car_parts,
C,
family = gaussian(),
prior = NULL,
ME = NULL,
centerx = FALSE,
prior_only = FALSE,
censor_point,
chains = 4,
iter = 2000,
refresh = 500,
keep_all = FALSE,
pars = NULL,
control = NULL,
...
)
```

Besag, Julian (1974). Spatial interaction and the statistical analysis of lattice systems. *Journal of the Royal Statistical Society* B36.2: 192–225.

Cressie, Noel (2015 (1993)). *Statistics for Spatial Data*. Wiley Classics, Revised Edition.

Cressie, Noel and Wikle, Christopher (2011). *Statistics for Spatio-Temporal Data*. Wiley.

Donegan, Connor and Chun, Yongwan and Griffith, Daniel A. (2021). Modeling community health with areal data: Bayesian inference with survey standard errors and spatial structure. *Int. J. Env. Res. and Public Health* 18 (13): 6856. DOI: 10.3390/ijerph18136856 Data and code: https://github.com/ConnorDonegan/survey-HBM.

Donegan, Connor (2021). Building spatial conditional autoregressive (CAR) models in the Stan programming language. *OSF Preprints*. doi:10.31219/osf.io/3ey65
.

Haining, Robert and Li, Guangquan (2020). *Modelling Spatial and Spatial-Temporal Data: A Bayesian Approach*. CRC Press.

- formula
A model formula, following the R

`formula`

syntax. Binomial models can be specified by setting the left hand side of the equation to a data frame of successes and failures, as in`cbind(successes, failures) ~ x`

.- slx
Formula to specify any spatially-lagged covariates. As in,

`~ x1 + x2`

(the intercept term will be removed internally). When setting priors for`beta`

, remember to include priors for any SLX terms.- re
To include a varying intercept (or "random effects") term,

`alpha_re`

, specify the grouping variable here using formula syntax, as in`~ ID`

. Then,`alpha_re`

is a vector of parameters added to the linear predictor of the model, and:`alpha_re ~ N(0, alpha_tau) alpha_tau ~ Student_t(d.f., location, scale).`

With the CAR model, any

`alpha_re`

term should be at a*different*level or scale than the observations; that is, at a different scale than the autocorrelation structure of the CAR model itself.- data
A

`data.frame`

or an object coercible to a data frame by`as.data.frame`

containing the model data.- car_parts
A list of data for the CAR model, as returned by

`prep_car_data`

.- C
Optional spatial connectivity matrix which will be used to calculate residual spatial autocorrelation as well as any user specified

`slx`

terms; it will automatically be row-standardized before calculating`slx`

terms. See`shape2mat`

.- family
The likelihood function for the outcome variable. Current options are

`auto_gaussian()`

,`binomial(link = "logit")`

, and`poisson(link = "log")`

; if`family = gaussian()`

is provided, it will automatically be converted to`auto_gaussian()`

.- prior
A named list of parameters for prior distributions (see

`priors`

):- intercept
The intercept is assigned a Gaussian prior distribution (see

`normal`

.
- beta
Regression coefficients are assigned Gaussian prior distributions. Variables must follow their order of appearance in the model

`formula`

. Note that if you also use`slx`

terms (spatially lagged covariates), and you use custom priors for`beta`

, then you have to provide priors for the slx terms. Since slx terms are*prepended*to the design matrix, the prior for the slx term will be listed first.- car_scale
Scale parameter for the CAR model,

`car_scale`

. The scale is assigned a Student's t prior model (constrained to be positive).- car_rho
The spatial autocorrelation parameter in the CAR model,

`rho`

, is assigned a uniform prior distribution. By default, the prior will be uniform over all permissible values as determined by the eigenvalues of the connectivity matrix,`C`

. The range of permissible values for`rho`

is automatically printed to the console by`prep_car_data`

.- tau
The scale parameter for any varying intercepts (a.k.a exchangeable random effects, or partial pooling) terms. This scale parameter,

`tau`

, is assigned a Student's t prior (constrained to be positive).

- ME
To model observational uncertainty (i.e. measurement or sampling error) in any or all of the covariates, provide a list of data as constructed by the

`prep_me_data`

function.- centerx
To center predictors on their mean values, use

`centerx = TRUE`

. If the ME argument is used, the modeled covariate (i.e., latent variable), rather than the raw observations, will be centered. When using the ME argument, this is the recommended method for centering the covariates.- prior_only
Logical value; if

`TRUE`

, draw samples only from the prior distributions of parameters.- censor_point
Integer value indicating the maximum censored value; this argument is for modeling censored (suppressed) outcome data, typically disease case counts or deaths.

- chains
Number of MCMC chains to use.

- iter
Number of samples per chain.

- refresh
Stan will print the progress of the sampler every

`refresh`

number of samples. Set`refresh=0`

to silence this.- keep_all
If

`keep_all = TRUE`

then samples for all parameters in the Stan model will be kept; this is necessary if you want to do model comparison with Bayes factors and the`bridgesampling`

package.- pars
Optional; specify any additional parameters you'd like stored from the Stan model.

- control
A named list of parameters to control the sampler's behavior. See

`stan`

for details.- ...
Other arguments passed to

`sampling`

. For multi-core processing, you can use`cores = parallel::detectCores()`

, or run`options(mc.cores = parallel::detectCores())`

first.

An object of class class `geostan_fit`

(a list) containing:

- summary
Summaries of the main parameters of interest; a data frame.

- diagnostic
Widely Applicable Information Criteria (WAIC) with a measure of effective number of parameters (

`eff_pars`

) and mean log pointwise predictive density (`lpd`

), and mean residual spatial autocorrelation as measured by the Moran coefficient.- stanfit
an object of class

`stanfit`

returned by`rstan::stan`

- data
a data frame containing the model data

- family
the user-provided or default

`family`

argument used to fit the model- formula
The model formula provided by the user (not including CAR component)

- slx
The

`slx`

formula- re
A list containing

`re`

, the varying intercepts (`re`

) formula if provided, and`Data`

a data frame with columns`id`

, the grouping variable, and`idx`

, the index values assigned to each group.- priors
Prior specifications.

- x_center
If covariates are centered internally (

`centerx = TRUE`

), then`x_center`

is a numeric vector of the values on which covariates were centered.- spatial
A data frame with the name of the spatial component parameter (either "phi" or, for auto Gaussian models, "trend") and method ("CAR")

- ME
A list indicating if the object contains an ME model; if so, the user-provided ME list is also stored here.

- C
Spatial connectivity matrix (in sparse matrix format).

CAR models are discussed in Cressie and Wikle (2011, p. 184-88), Cressie (2015, Ch. 6-7), and Haining and Li (2020, p. 249-51). It is often used for areal or lattice data.

Details for the Stan code for this implementation of the CAR model can be found in Donegan (2021).

The general scheme for the CAR model is as follows: $$ y \sim Gauss( \mu, ( I - \rho C)^{-1} M), $$ where \(I\) is the identity matrix, \(\rho\) is a spatial dependence parameter, \(C\) is a spatial connectivity matrix, and \(M\) is a diagonal matrix of variance terms. The diagonal of \(M\) contains a scale parameter \(\tau\) multiplied by a vector of weights (often set to be proportional to the inverse of the number of neighbors assigned to each site). The CAR model owes its name to the fact that this joint distribution corresponds to a set of conditional distributions that relate the expected value of each observation to a function of neighboring values, i.e., the Markov condition holds: $$ E(y_i | y_1, y_2, \dots, y_{i-1}, y_{i+1}, \dots, y_n) = \mu_i + \rho \sum_{j=1}^n c_{i,j} (y_j - \mu_j), $$ where entries of \(c_{i,j}\) are non-zero only if \(j \in N(i)\) and \(N(i)\) indexes the sites that are neighbors of the \(i^{th}\) site.

With the Gaussian probability distribution, $$ y_i | y_j: j \neq i \sim Gauss(\mu_i + \rho \sum_{j=1}^n c_{i,j} (y_j - \mu_j), \tau_i^2) $$ where \(\tau_i\) is a scale parameter and \(\mu_i\) may contain covariates or simply the intercept.

The covariance matrix of the CAR model contains two parameters: \(\rho\) (`car_rho`

) which controls the kind (positive or negative) and degree of spatial autocorrelation, and the scale parameter \(\tau\) (`car_scale`

). The range of permissible values for \(\rho\) depends on the specification of \(\boldsymbol C\) and \(\boldsymbol M\); for specification options, see prep_car_data and Cressie and Wikle (2011, pp. 184-188) or Donegan (2021).

Further details of the models and results depend on the `family`

argument, as well as on the particular CAR specification chosen (from prep_car_data).

When `family = auto_gaussian()`

(the default), the CAR model is applied directly to the data as follows:
$$
y \sim Gauss( \mu, (I - \rho C)^{-1} M),
$$
where \(\mu\) is the mean vector (with intercept, covariates, etc.), \(C\) is a spatial connectivity matrix, and \(M\) is a known diagonal matrix containing the conditional variances \(\tau_i^2\). \(C\) and \(M\) are provided by prep_car_data.

The auto-Gaussian model contains an implicit spatial trend (i.e. autocorrelation) component \(\phi\) which can be calculated as follows (Cressie 2015, p. 564): $$ \phi = \rho C (y - \mu). $$ This term can be extracted from a fitted auto-Gaussian model using the spatial method.

When applied to a fitted auto-Gaussian model, the residuals.geostan_fit method returns 'de-trended' residuals \(R\) by default. That is,
$$
R = y - \mu - \rho C (y - \mu).
$$
To obtain "raw" residuals (\(y - \mu\)), use `residuals(fit, detrend = FALSE)`

. Similarly, the fitted values obtained from the fitted.geostan_fit will include the spatial trend term by default.

For `family = poisson()`

, the model is specified as:
$$
y \sim Poisson(e^{O + \lambda}) \\
\lambda \sim Gauss(\mu, (I - \rho C)^{-1} \boldsymbol M).
$$
If the raw outcome consists of a rate \(\frac{y}{p}\) with observed counts \(y\) and denominator p (often this will be the size of the population at risk), then the offset term \(O=log(p)\) is the log of the denominator.

This is often written (equivalently) as: $$ y \sim Poisson(e^{O + \mu + \phi}) \\ \phi \sim Gauss(0, (I - \rho C)^{-1} \boldsymbol M). $$ For Poisson models, the spatial method returns the parameter vector \(\phi\).

In the Poisson CAR model, \(\phi\) contains a latent spatial trend as well as additional variation around it: \(\phi_i = \rho \sum_{i=1}^n c_{ij} \phi_j + \epsilon_i\), \(\epsilon_i \sim Gauss(0, \tau_i^2)\). If you would like to extract the latent/implicit spatial trend from \(\phi\), you can do so by calculating (following Cressie 2015, p. 564): $$ \rho C \phi. $$

For `family = binomial()`

, the model is specified as:
$$
y \sim Binomial(N, \lambda) \\
logit(\lambda) \sim Gauss(\mu, (I - \rho C)^{-1} \boldsymbol M).
$$
where outcome data \(y\) are counts, \(N\) is the number of trials, and \(\lambda\) is the 'success' rate. Note that the model formula should be structured as: `cbind(sucesses, failures) ~ x`

, such that `trials = successes + failures`

.

This is often written (equivalently) as:
$$
y \sim Binomial(N, (\mu + \phi)) \\
logit(\phi) \sim Gauss(0, (I - \rho C)^{-1} \boldsymbol M).
$$
For fitted Binomial models, the spatial method will return the parameter vector `phi`

.

As is also the case for the Poisson model, \(\phi\) contains a latent spatial trend as well as additional variation around it. If you would like to extract the latent/implicit spatial trend from \(\phi\), you can do so by calculating: $$ \rho C \phi. $$

The `slx`

argument is a convenience function for including SLX terms. For example,
$$
y = W X \gamma + X \beta + \epsilon
$$
where \(W\) is a row-standardized spatial weights matrix (see shape2mat), \(WX\) is the mean neighboring value of \(X\), and \(\gamma\) is a coefficient vector. This specifies a regression with spatially lagged covariates. SLX terms can specified by providing a formula to the `slx`

argument:

```
stan_glm(y ~ x1 + x2, slx = ~ x1 + x2, \...),
```

which is a shortcut for

```
stan_glm(y ~ I(W \%*\% x1) + I(W \%*\% x2) + x1 + x2, \...)
```

SLX terms will always be *prepended* to the design matrix, as above, which is important to know when setting prior distributions for regression coefficients.

For measurement error (ME) models, the SLX argument is the only way to include spatially lagged covariates since the SLX term needs to be re-calculated on each iteration of the MCMC algorithm.

The ME models are designed for surveys with spatial sampling designs, such as the American Community Survey (ACS) estimates. Given estimates \(x\), their standard errors \(s\), and the target quantity of interest (i.e., the unknown true value) \(z\), the ME models have one of the the following two specifications, depending on the user input. If a spatial CAR model is specified, then: $$ x \sim Gauss(z, s^2) \\ z \sim Gauss(\mu_z, \Sigma_z) \\ \Sigma_z = (I - \rho C)^{-1} M \\ \mu_z \sim Gauss(0, 100) \\ \tau_z \sim Student(10, 0, 40), \tau > 0 \\ \rho_z \sim uniform(l, u) $$ where \(\Sigma\) specifies a spatial conditional autoregressive model with scale parameter \(\tau\) (on the diagonal of \(M\)), and \(l\), \(u\) are the lower and upper bounds that \(\rho\) is permitted to take (which is determined by the extreme eigenvalues of the spatial connectivity matrix \(C\)).

For non-spatial ME models, the following is used instead: $$ x \sim Gauss(z, s^2) \\ z \sim student(\nu_z, \mu_z, \sigma_z) \\ \nu_z \sim gamma(3, 0.2) \\ \mu_z \sim Gauss(0, 100) \\ \sigma_z \sim student(10, 0, 40). $$

For strongly skewed variables, such as census tract poverty rates, it can be advantageous to apply a logit transformation to \(z\) before applying the CAR or Student-t prior model. When the `logit`

argument is used, the model becomes:
$$
x \sim Gauss(z, s^2) \\
logit(z) \sim Gauss(\mu_z, \Sigma_z)
...
$$
and similarly for the Student t model:
$$
x \sim Gauss(z, s^2) \\
logit(z) \sim student(\nu_z, \mu_z, \sigma_z) \\
...
$$

Vital statistics systems and disease surveillance programs typically suppress case counts when they are smaller than a specific threshold value. In such cases, the observation of a censored count is not the same as a missing value; instead, you are informed that the value is an integer somewhere between zero and the threshold value. For Poisson models (`family = poisson())`

), you can use the `censor_point`

argument to encode this information into your model.

Internally, `geostan`

will keep the index values of each censored observation, and the index value of each of the fully observed outcome values. For all observed counts, the likelihood statement will be:
$$
p(y_i | data, model) = poisson(y_i | \mu_i),
$$
as usual, where \(\mu_i\) may include whatever spatial terms are present in the model.

For each censored count, the likelihood statement will equal the cumulative Poisson distribution function for values zero through the censor point: $$ p(y_i | data, model) = \sum_{m=0}^{M} Poisson( m | \mu_i), $$ where \(M\) is the censor point and \(\mu_i\) again is the fitted value for the \(i^{th}\) observation.

For example, the US Centers for Disease Control and Prevention's CDC WONDER database censors all death counts between 0 and 9. To model CDC WONDER mortality data, you could provide `censor_point = 9`

and then the likelihood statement for censored counts would equal the summation of the Poisson probability mass function over each integer ranging from zero through 9 (inclusive), conditional on the fitted values (i.e., all model parameters). See Donegan (2021) for additional discussion, references, and Stan code.

```
# model mortality risk
data(georgia)
C <- shape2mat(georgia, style = "B")
cp <- prep_car_data(C)
fit <- stan_car(deaths.male ~ offset(log(pop.at.risk.male)),
car_parts = cp,
data = georgia,
family = poisson(),
iter = 800, chains = 1 # for example speed only
)
rstan::stan_rhat(fit$stanfit)
rstan::stan_mcse(fit$stanfit)
print(fit)
sp_diag(fit, georgia)
# \donttest{
## DCAR specification (inverse-distance based)
library(sf)
A <- shape2mat(georgia, "B")
D <- sf::st_distance(sf::st_centroid(georgia))
A <- D * A
cp <- prep_car_data(A, "DCAR", k = 1)
fit <- stan_car(deaths.male ~ offset(log(pop.at.risk.male)),
data = georgia,
car = cp,
family = poisson(),
iter = 800, chains = 1 # for example speed only
)
print(fit)
# }
```