The geostan R package supports a complete spatial analysis workflow with Bayesian models for areal data, including a suite of functions for visualizing spatial data and model results. For demonstrations and discussion, see the package help pages and vignettes on spatial autocorrelation, spatial measurement error models, and spatial regression with raster layers.
The package is particularly suitable for public health research with spatial data, and complements the surveil R package for time series analysis of public health surveillance data.
geostan models were built using Stan, a state-of-the-art platform for Bayesian modeling.
Statistical models for data recorded across areal units like states, counties, or census tracts.
Incorporate information on data reliability, such as standard errors of American Community Survey estimates, into any geostan model.
Vital statistics and disease surveillance systems like CDC Wonder censor case counts that fall below a threshold number; geostan can model disease or mortality risk with censored observations.
Tools for visualizing and measuring spatial autocorrelation and map patterns, for exploratory analysis and model diagnostics.
Tools for building custom spatial models in Stan.
Install geostan from CRAN using:
All functions and methods are documented (with examples) on the website reference page. See the package vignettes for more on exploratory spatial data analysis, spatial measurement error models, and spatial regression with large raster layers.
Load the package and the
georgia county mortality data set (ages 55-64, years 2014-2018):
sp_diag function provides visual summaries of spatial data, including a histogram, Moran scatter plot, and map:
A <- shape2mat(georgia, style = "B") sp_diag(georgia$rate.female, georgia, w = A) #> 3 NA values found in x; they will be dropped from the data before creating the Moran plot. If matrix w was row-standardized, it no longer is. You may want to use a binary connectivity matrix using style = 'B' in shape2mat. #> Warning: Removed 3 rows containing non-finite values (`stat_bin()`).
There are three censored observations in the
georgia female mortality data, which means there were 9 or fewer deaths in those counties. The following code fits a spatial conditional autoregressive (CAR) model to female county mortality data. By using the
censor_point argument we include our information on the censored observations to obtain results for all counties:
cars <- prep_car_data(A) #> Range of permissible rho values: -1.661134 1 fit <- stan_car(deaths.female ~ offset(log(pop.at.risk.female)), censor_point = 9, data = georgia, car_parts = cars, family = poisson(), cores = 4, # for multi-core processing refresh = 0) # to silence some printing #> #> *Setting prior parameters for intercept #> Distribution: normal #> location scale #> 1 -4.7 5 #> #> *Setting prior for CAR scale parameter (car_scale) #> Distribution: student_t #> df location scale #> 1 10 0 3 #> #> *Setting prior for CAR spatial autocorrelation parameter (car_rho) #> Distribution: uniform #> lower upper #> 1 -1.7 1
Passing a fitted model to the
sp_diag function will return a set of diagnostics for spatial models:
sp_diag(fit, georgia, w = A) #> Using sp_diag(y, shape, rates = TRUE, ...). To examine data as (unstandardized) counts, use rates = FALSE. #> 3 NA values found in x; they will be dropped from the data before creating the Moran plot. If matrix w was row-standardized, it no longer is. You may want to use a binary connectivity matrix using style = 'B' in shape2mat. #> Warning: Removed 3 rows containing missing values #> (`geom_pointrange()`).
se_mean, effective sample size
n_eff, and the R-hat statistic
print(fit) #> Spatial Model Results #> Formula: deaths.female ~ offset(log(pop.at.risk.female)) #> Spatial method (outcome): CAR #> Likelihood function: poisson #> Link function: log #> Residual Moran Coefficient: 0.0021755 #> WAIC: 1291.91 #> Observations: 159 #> Data models (ME): none #> Inference for Stan model: foundation. #> 4 chains, each with iter=2000; warmup=1000; thin=1; #> post-warmup draws per chain=1000, total post-warmup draws=4000. #> #> mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat #> intercept -4.673 0.002 0.093 -4.843 -4.716 -4.674 -4.630 -4.497 1636 1.000 #> car_rho 0.922 0.001 0.058 0.778 0.893 0.935 0.967 0.995 3214 1.001 #> car_scale 0.458 0.001 0.035 0.393 0.433 0.455 0.481 0.532 3867 1.000 #> #> Samples were drawn using NUTS(diag_e) at Wed Oct 4 12:20:45 2023. #> For each parameter, n_eff is a crude measure of effective sample size, #> and Rhat is the potential scale reduction factor on split chains (at #> convergence, Rhat=1).