Improving Survey Inference Using Administrative Records Without Releasing Individual-Level Continuous Data
Abstract
Probability surveys are challenged by increasing nonresponse rates, resulting in biased statistical inference. Auxiliary information about populations can be used to reduce bias in estimation. Often continuous auxiliary variables in administrative records are first discretized before releasing to the public to avoid confidentiality breaches. This may weaken the utility of the administrative records in improving survey estimates, particularly when there is a strong relationship between continuous auxiliary information and the survey outcome. In this paper, we propose a two-step strategy, where the confidential continuous auxiliary data in the population are first utilized to estimate the response propensity score of the survey sample by statistical agencies, which is then included in a modified population data for data users. In the second step, data users who do not have access to confidential continuous auxiliary data conduct predictive survey inference by including discretized continuous variables and the propensity score as predictors using splines in a Bayesian model. We show by simulation that the proposed method performs well, yielding more efficient estimates of population means with 95% credible intervals providing better coverage than alternative approaches. We illustrate the proposed method using the Ohio Army National Guard Mental Health Initiative (OHARNG-MHI). The methods developed in this work are readily available in the R package AuxSurvey.
Keywords: Bayesian predictive inference; Rstan; continuous auxiliary variables; generalized additive model; inclusion propensity; poststratification.
Citation
Williams SZ, Zou J, Liu Y, Si Y, Galea S, Chen Q. Improving Survey Inference Using Administrative Records Without Releasing Individual-Level Continuous Data. Stat Med. 2024 Nov 18. doi: 10.1002/sim.10270. Epub ahead of print. PMID: 39557420.