Sampling and Estimation Methodology for Hidden Populations
Hidden populations, also referred to as hard-to-reach or hard-to-sample populations, are characterized by the difficulty researchers face in accessing them. Members of hidden populations often cannot be reached using conventional sampling techniques, as sampling frames and contact information are unavailable. They may practice stigmatized or illegal behaviors, often have low trust for researchers, and are relatively rare with respect to the general population. Examples of hidden populations include LGBTQ individuals, sex workers, those experiencing homelessness, migrants/internally displaced persons, people who inject drugs, trafficked persons, and others. Members of hidden populations are some of the most vulnerable to and share some of the highest burden from infectious diseases such as HIV/AIDS, substance misuse, and related behavioral health issues. Understanding the needs of these populations is an important part of epidemiological, demographic, and public health research. I have worked specifically on a peer-recruitment sampling method called respondent-driven sampling (RDS) and associated estimation methodologies, including a rational-choice model for the sampling mechanism. These methodologies allow for better inferences to be gained from surveys of hidden populations.
Estimating the Size of Hidden Populations
Members of hidden populations often cannot be reached using conventional sampling techniques, as sampling frames and contact information are unavailable, and so basic population-level information such as N, the overall population size is often unknown. Population size is essential as it is used as a denominator in many analyses including some types of prevalence estimation. In this area, I have developed novel estimators and analysis methodologies related to successive sampling population size estimation (SS-PSE). The imputed visibility modification to SS-PSE incorporates a measurement error model for self-reported social network sizes and allows for improved estimation in many situations. The extension for clustered hidden populations allows for the scenario where there are bottlenecks in the underlying social network. Finally, the capture-recapture extension improves estimation by utilizing data from two RDS surveys. These methods allow for improved estimation of the size of hidden populations sampled using respondent-driven sampling (RDS). I have applied and tested these methods in a number of settings worldwide.
Targeted Sampling Methodology for COVID-19 in Communities
I helped develop a targeted random door-to-door sampling method informed by community wastewater measurements (wastewater-based epidemiology) that was implemented in two communities in Oregon in 2021. The sampling design is a three-stage design with strata informed by microsewershed boundaries, clusters corresponding to one or more adjacent census blocks selected with probability proportional to size, and systematic sampling of housing units within clusters. This design is intended to allow the allocation of field teams collecting nasal swabs such that an unbiased prevalence estimate can be obtained while attempting to discover positive individuals. Prevalence estimates from our work helped inform community health decisions and Oregon’s tiered reopening plan. I have worked as a co-PI of community-engaged projects to help detect and mitigate the spread of COVID-19 and other pathogens with the goal of creating pandemic resilient cities.
Network Methodology for Social Sciences
I am broadly interested in network analysis for social sciences, focusing on missing data and measurement error issues in the context of dependent data and sampling challenges that arise for networks. Social network data present issues for traditional analysis because of the complex dependencies that exist between individuals as well as the propensity for measurement error that can induce bias. With co-authors from a variety of disciplines, I have developed a model for error-prone responses due to memory recall in the context of academic collaboration networks, a method to assess trends in co-publication disparities by race and gender over time among academics, and modeled the effect of resource diversity and productivity on household position in Alaskan food sharing networks.
Collaborative Research in Agricultural Sciences
As an applied statistician, I collaborate on a number of interdisciplinary projects and provide support for statistical modeling and analysis, particularly using linear mixed effects models. I highlight results from long-term collaborations in the areas of crop and soil science and horticulture.