Theil's entropy-based index of inequality


theil2(Count, Population, rates, total = TRUE)

# S3 method for surveil

# S3 method for list


Conceicao, P. and P. Ferreira (2000). The young person's guide to the Theil Index: Suggesting intuitive interpretations and exploring analytical applications. University of Texas Inequality Project. UTIP Working Paper Number 14. Accessed May 1, 2021 from

Conceicao, P, Galbraith, JK, Bradford, P. (2001). The Theil Index in sequences of nested and hierarchic grouping structures: implications for the measurement of inequality through time, with data aggregated at different levels of industrial classification. Eastern Economic Journal. 27(4): 491-514.

Theil, Henri (1972). Statistical Decomposition Analysis. Amsterdam, The Netherlands and London, UK: North-Holland Publishing Company.

Shannon, Claude E. and Weaver, Warren (1963). The Mathematical Theory of Communication. Urbana and Chicago, USA: University if Illinois Press.



A fitted surveil model, from stan_rw; or, a list of fitted surveil models, where each model represents a different geographic area (e.g., states).


Case counts, integers


Population at risk, integers


If Count is not provided, then rates must be provided (Count = rates * Population).


If total = TRUE, Theil's index will be returned. Each unit contributes to Theil's index; if total = FALSE, all of the elements that sum to Theil's index will be returned.



If total = TRUE (the default), theil2 returns Theil's index as a numeric value. Else, theil2 returns a vector of values that sum to Theil's index.


A named list with the following elements:


A data.frame summarizing the posterior probability distribution for Theil's T, including the mean and 95 percent credible interval for each time period


A data.frame with MCMC samples for Theil's T


A list (also of class theil_list) containing a summary data frame and a tbl_df containing MCMC samples for Theil's index at each time period.

The summary data frame includes the following columns:


time period


Posterior mean for Theil's index; equal to the sum of Theil_between and Theil_within.


The between-areas component to Theil's inequality index


The within-areas component to Theil's inequality index

Additional columns contain the upper and lower limits of the 95 percent credible intervals for each component of Theil's index.

The data frame of samples contains the following columns:


Time period indicator


An id for each MCMC sample; note that samples are from the joint distribution


The between-geographies component of Theil's index


The within-geographies component of Theil's index


Theil's inequality index (T = Between + Within)



Theil's index is a good index of inequality in disease and mortality burdens when multiple groups are being considered. It provides a summary measure of inequality across a set of demographic groups that may be tracked over time (and/or space). Also, it is interesting because it is additive, and thus admits of simple decompositions.

The index measures discrepancies between a population's share of the disease burden, omega, and their share of the population, eta. A situation of zero inequality would imply that each population's share of cases is equal to its population share, or, omega=eta. Each population's contribution to total inequality is calculated as:

             T_i = omega_i * [log(omega_i/eta_i)],

the log-ratio of case-share to population-share, weighted by their share of cases. Theil's index for all areas is the sum of each area's T_i:

             T = sum_(i=1)^n T_i.

Theil's T is thus a weighted mean of log-ratios of case shares to population shares, where each log-ratio (which we may describe as a raw inequality score) is weighted by its share of total cases. The index has a minimum of zero and a maximum of log(N), where N is the number of units (e.g., number of states).

Theil's index, which is based on Shannon's information theory, can be extended to measure inequality across multiple groups nested within non-overlapping geographies (e.g., states).


# \donttest{
 houston <- msa[grep("Houston", msa$MSA), ]
 fit <- stan_rw(houston, time = Year, group = Race,
               chains = 2, iter = 900) # for speed only
 theil_dfw <- theil(fit)
# }

Count <- c(10, 12, 3, 111)
Pop <- c(1000, 1200, 4000, 9000)
theil2(Count, Pop)
theil2(Count, Pop, total = FALSE)