surveil provides a number of functions and methods for measuring health differences or inequalities. These can calculated using the group_diff function.

Pairwise measures of inequality have an important purpose (tracking health inequalities between historically or currently advantaged and disadvantaged groups), but we should keep in mind the limitations inherent in tracking the progress of any group by comparing that group to a moving target (particularly one that may move in an unfavorable direction).

Concepts

The group_diff function returns estimates for the following quantities, where \(A\) is the incidence rate for the advantaged group, \(D\) is the incidence rate of the disadvantaged group, and \(P_d\) is the size of the population at risk for the disadvantaged group.

Table 1: Group difference measures for age-stratified groups.

Concept Formula
Rate Ratio (RR) \(\frac{D}{A}\)
Rate Difference (RD) \(D - A\)
Excess Cases (EC) \((D-A) \times P_d\)
Proportion Attributable Risk (PAR) \(\frac{D-A}{D}\)

Notice that the PAR is simply the rate difference expressed as a fraction of total risk; it indicates the fraction of risk in the target population that would have been removed had the target rate been equal to the reference rate (Menvielle, Kulhánová, and Machenbach 2019).

Demonstration

We will illustrate using the colorectal cancer data from Texas MSAs aggregated by race-ethnicity:

library(surveil)
data(msa)
msa2 <- aggregate(cbind(Count, Population) ~ Year + Race, data = msa, FUN = sum)
head(msa2)
#>   Year                      Race Count Population
#> 1 1999 Black or African American   471     270430
#> 2 2000 Black or African American   455     283280
#> 3 2001 Black or African American   505     298287
#> 4 2002 Black or African American   539     313133
#> 5 2003 Black or African American   546     329481
#> 6 2004 Black or African American   602     346886

This fits the time trend models to each group:

# refresh = 0 will silence some printing
fit <- stan_rw(msa2, time = Year, group = Race, iter = 1e3, refresh = 0)
#> Distribution: normal
#> Distribution: normal
#> [1] "Setting normal prior(s) for eta_1: "
#>  location scale
#>        -6     5
#> [1] "\nSetting half-normal prior for sigma: "
#>  location scale
#>         0     1

To calculate the pairwise difference or inequality measures for two groups in our data, we call group_diff on our fitted model.

We have to name which group is the reference and which is the target group. The group names must match the labels in the data:

unique(msa2$Race)
#> [1] "Black or African American" "Hispanic"                 
#> [3] "White"

In this case, we will use whites as the reference rate and African Americans as the target rate:

gd <- group_diff(fit, target = "Black or African American", reference = "White")
print(gd, scale = 100e3)
#> Summary of Pairwise Inequality
#> Target group: Black or African American
#> Reference group: White
#> Time periods observed: 19
#> Rate scale: per 100,000
#> Cumulative excess cases (EC): 3,205 [2979, 3433]
#> Cumulative EC as a fraction of group risk (PAR): 0.27 [0.25, 0.28]
#>  time Rate RD  PAR  RR  EC
#>  1999  170 35 0.20 1.3  94
#>  2000  166 30 0.18 1.2  85
#>  2001  168 34 0.20 1.3 102
#>  2002  169 39 0.23 1.3 121
#>  2003  167 39 0.24 1.3 130
#>  2004  167 46 0.28 1.4 161
#>  2005  159 44 0.28 1.4 161
#>  2006  155 45 0.29 1.4 179
#>  2007  153 44 0.29 1.4 187
#>  2008  150 46 0.31 1.4 204
#>  2009  144 45 0.31 1.5 208
#>  2010  139 43 0.31 1.4 209
#>  2011  131 37 0.28 1.4 191
#>  2012  125 34 0.27 1.4 184
#>  2013  125 35 0.28 1.4 195
#>  2014  126 35 0.27 1.4 204
#>  2015  123 29 0.24 1.3 180
#>  2016  122 33 0.27 1.4 210
#>  2017  123 30 0.25 1.3 200

All of the surveil plotting and printing methods provide an option to scale rates by a custom value. By setting scale = 100e3 (100,000), the RD is printed as cases per 100,000. None of the other inequality measures (PAR, RR, EC) are impacted by this choice.

The plot method for surveil_diff produces one time series ggplot each for RD, PAR, and EC. The estimates for each measure are plotted as lines, while the shading indicates a 95% credible interval:

plot(gd, scale = 100e3)
#> Rate differences (RD) are per 100,000 at risk

(The estimates are the means of the posterior probability distributions for each quantity of interest.) If we wanted to replace the plot of the PAR with one of the rate ratio we would set the PAR option to FALSE, as in:

# figure not shown
plot(gd, scale = 100e3, PAR = FALSE)

Comparing age-stratified groups

The group_diff function can be used to obtain measures of inequality between groups with age-standardized rates. The measures of excess cases (EC) and proportion attributable risk (PAR) are adjusted to account for age-specific rates and varying sizes of populations at risk.

In the following table, \(D\) and \(A\) refer to the age-standardized rates for the disadvantaged and advantaged populations, respectively. Age groups are indexed by \(i\), such that \(D_i\) is the incidence rate in the \(i^{th}\) age group for the disadvantaged population.

Table 2: Group difference measures for age-stratified groups.

Concept Formula
Rate Ratio (RR) \(\frac{D}{A}\)
Rate Difference (RD) \(D - A\)
Excess Cases (EC) \(\sum_i (D_i - A_i) * P_{d,i}\)
Proportion Attributable Risk (PAR) \(\frac{\sum_i (D_i - A_i) * P_{d,i}}{ \sum_i D_i * P_{d,i} }\)

The EC measure sums the excess cases across all age groups, and the PAR divides the EC measure by the total risk across all age groups in the disadvantaged population. For age stratified populations, the PAR may be preferred over the RR as a measure of relative inequality because the PAR reflects the actual population sizes.

To compare two age-stratified population segments, you will first obtain age-standardized rates for each group separately. (See the vignette on age-standardization using vignette("age-standardization").) When fitting these models, be sure to use the same MCMC parameters each time: the number of MCMC iterations and chains must match (that is, use the same iter and chains values for each model).

For any two groups A and B, you might name these age-standardized models fit_a and fit__b. Next, combine them into a list. The first item in the list (A) will be used as the target group and the second (B) will be treated as the reference group:

fit_list <- list(group_a = fit_sr_a, group_b = fit_sr_b)

Then pass that list to group_diff:

diff <- group_diff(fit_list)

This will compare group A to group B.

Measuring inequality with multiple groups

Pairwise cannot provide a summary of dispersion or inequality across multiple groups. Theil’s T is an entropy-based inequality index with many favorable qualities, including that it naturally accommodates complex (nested) grouping structures (Theil 1972; Conceição and Galbraith 2000; Conceição and Ferreira 2000).

Theil’s T measures the extent to which groups are under- or over-burdened by disease, meaning simply that the proportion of cases accounted for by a particular group, \(\omega_j\), is lower or higher than the proportion of the population constituted by that same group, \(\eta_j\). With \(k\) groups, Theil’s index is \[T = \sum_{j=1}^k \omega_j \big[ log(\omega_j / \eta_j) \big].\] This is zero when case shares equal population shares and it increases monotonically as the two diverge for any group. Theil’s T is thus a weighted mean of log-ratios of case shares to population shares, where each log-ratio (which we may describe as a raw inequality score) is weighted by its share of total cases.

Theil’s T can be computed from a fitted surveil model, the only requirement is that the model includes multiple groups (through the group argument):

Ts <- theil(fit)
print(Ts)
#> Summary of Theil's Inequality Index
#> Groups:
#> Time periods observed: 19
#> Theil's T (times 100) with 95% credible intervals
#>  time Theil .lower .upper
#>  1999 0.935  0.628   1.31
#>  2000 0.799  0.537   1.08
#>  2001 0.904  0.638   1.20
#>  2002 0.978  0.708   1.30
#>  2003 0.988  0.715   1.33
#>  2004 1.063  0.740   1.45
#>  2005 1.001  0.715   1.36
#>  2006 1.045  0.722   1.39
#>  2007 1.102  0.773   1.46
#>  2008 1.245  0.906   1.64
#>  2009 1.203  0.863   1.60
#>  2010 1.134  0.800   1.52
#>  2011 0.910  0.601   1.27
#>  2012 0.768  0.474   1.10
#>  2013 0.805  0.515   1.14
#>  2014 0.855  0.565   1.20
#>  2015 0.680  0.438   0.95
#>  2016 0.806  0.539   1.11
#>  2017 0.720  0.450   1.05

The results can be visualized using plot:

plot(Ts)

While the minimum of Theil’s index is always zero, the maximum value varies with the structure of the population under observation. The index is useful for comparisons such as monitoring change over time, and should generally not be used as a indication of the absolute level of inequality.

The index also can be extended for groups nested in regions such as racial-ethnic groups within states. Theil’s index can provide a measure of geographic inequality across states (between-state inequality), and social inequality within states (within-state inequality) (Conceição, Galbraith, and Bradford 2001). For details, see ?theil.

References

Conceição, Pedro, and Pedro Ferreira. 2000. “The Young Person’s Guide to the Theil Index: Suggesting Intuitive Interpretations and Exploring Analytical Applications.” University of Texas Inequality Project (UTIP). https://utip.gov.utexas.edu/papers.html.

Conceição, Pedro, and James K. Galbraith. 2000. “Constructing Long and Dense Time Series of Inequality Using the Theil Index.” Eastern Economic Journal 26 (1): 61–74.

Conceição, Pedro, James K Galbraith, and Peter Bradford. 2001. “The Theil Index in Sequences of Nested and Hierarchic Grouping Structures: Implications for the Measurement of Inequality Through Time, with Data Aggregated at Different Levels of Industrial Classification.” Eastern Economic Journal 27 (4): 491–514.

Menvielle, Gwenn, Kulhánová, and Johan P. Machenbach. 2019. “Assessing the Impact of a Public Health Intervention to Reduce Social Inequalities in Cancer.” In Reducing Social Inequalities in Cancer: Evidence and Priorities for Research, edited by Salvatore Vaccarella, Joannie Lortet-Tieulent, Rodolfo Saracci, David I. Conway, Kurt Straif, and Christopher P. Wild, 185–92. Geneva, Switzerland: WHO Press.

Theil, Henry. 1972. Statistical Decomposition Analysis. Amsterdam, The Netherlands; London, UK: North-Holland Publishing Company.