metasurvey is an R package for processing and analysing complex survey data using metaprogramming and reproducible pipelines. It integrates with the survey package and is designed for complex sampling designs and recurring estimations over time (rotating panels, repeated cross-sections).
If you find this useful, please consider giving us a ⭐ star on GitHub — it helps others discover the project!
Live services
The full stack is deployed and publicly available:
| Service | URL | Description |
|---|---|---|
| Recipe Explorer | metasurvey-shiny-production.up.railway.app | Interactive Shiny app to browse, search and inspect community recipes and workflows |
| REST API | API reference | Plumber API backed by MongoDB for publishing and discovering recipes (self-hosting guide) |
| pkgdown site | metasurveyr.github.io/metasurvey | Full package documentation and vignettes |
Key features
-
Steps: lazy transformation pipeline (
step_compute,step_recode,step_rename,step_remove,step_join) executed viabake_steps(). -
Recipes: portable, versioned objects that encapsulate harmonisation pipelines with automatic documentation (
doc()) and validation (validate()). -
Workflows: estimation with
survey::svymean,svytotal,svyratioandsvybyintegrated inworkflow(), returning adata.tablewith value, standard error and coefficient of variation. -
Rotating panels: support for
RotativePanelSurveywith implantation and follow-ups, andPoolSurveyfor combined estimation. -
Replicate weights: bootstrap replicate configuration via
add_replicate()for robust variance withsurvey::svrepdesign. - Recipe registry: publish, search and discover recipes and workflows through a self-hosted REST API or a local JSON registry.
-
Shiny app: interactive recipe and workflow explorer with
explore_recipes(). -
Self-hosting: deploy the full stack on your infrastructure with Docker Compose or Kubernetes. Publish indicators with full traceability (indicator → workflow → recipe) while keeping microdata private. See
vignette("self-hosting"). -
STATA transpiler: convert
.dofiles into reproducible Recipe objects.
Works with any household survey
The step pipeline and workflow system are survey-agnostic. The same verbs process Argentina’s EPH, Chile’s CASEN, Brazil’s PNAD-C, the US CPS, Mexico’s ENIGH, or DHS data from 90+ countries.
| EPH | CASEN | PNAD-C | CPS | ENIGH | DHS | |
|---|---|---|---|---|---|---|
| Steps (compute / recode / rename / remove / join) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
Weights (add_weight) |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| Stratified + cluster designs | ✅ | ✅ | ✅ | – | ✅ | ✅ |
Replicate weights (add_replicate) |
– | – | ✅ | ✅ | – | – |
Rotating panels (RotativePanelSurvey) |
✅ | – | ✅ | ✅ | – | – |
| Recipes & workflows | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
# Same pipeline, different surveys ─────────────────────────
# Argentina (eph)
eph_svy <- Survey$new(
data = as.data.table(eph::get_microdata(2023, 3)),
edition = "2023-T3", type = "eph", psu = NULL,
engine = "data.table", weight = add_weight(quarterly = "PONDERA")
)
# Chile (casen)
casen_svy <- Survey$new(
data = as.data.table(casen::descargar_casen_github(2017)),
edition = "2017", type = "casen", psu = "varunit",
engine = "data.table", weight = add_weight(annual = "expr")
)
# Both use the exact same verbs
process <- function(svy) {
svy |>
step_recode(employed, labor_status == 1 ~ 1L, .default = 0L,
comment = "Binary employment indicator") |>
bake_steps()
}See vignette("international-surveys") for reproducible examples with all seven surveys.
Installation
Install the stable version from CRAN:
install.packages("metasurvey")Or the development version from GitHub:
# install.packages("devtools")
devtools::install_github("metasurveyR/metasurvey")Quick example
library(metasurvey)
# Create a survey with sample data
data(api, package = "survey")
svy <- Survey$new(
data = apistrat,
edition = "2000",
type = "api",
psu = NULL,
engine = "data.table",
weight = add_weight(annual = "pw")
)
# Lazy transformations
svy <- step_compute(svy, growth = api00 - api99, comment = "API growth")
svy <- step_recode(svy, school_level,
stype == "E" ~ "Elementary",
stype == "M" ~ "Middle",
stype == "H" ~ "High",
.default = NA_character_
)
svy <- bake_steps(svy)
# Estimation
workflow(
list(svy),
survey::svymean(~growth, na.rm = TRUE),
estimation_type = "annual"
)Full example: ECH panel with bootstrap replicate weights
This example uses the rotating panel from Uruguay’s Encuesta Continua de Hogares (ECH) with bootstrap replicate weights. First, download the example data:
download_example_ech <- function() {
zip_url <- "https://informe-tfg.s3.us-east-2.amazonaws.com/example-data.zip"
dest_zip <- "example-data.zip"
temp_dir <- tempfile("example-data")
download.file(zip_url, destfile = dest_zip, mode = "wb")
dir.create(temp_dir)
unzip(dest_zip, exdir = temp_dir)
target_dir <- "example-data"
dir.create(target_dir, recursive = TRUE, showWarnings = FALSE)
file.rename(
list.files(file.path(temp_dir, "example-data"), full.names = TRUE),
file.path(target_dir, basename(list.files(file.path(temp_dir, "example-data"))))
)
unlink(dest_zip)
unlink(temp_dir, recursive = TRUE)
}
download_example_ech()With the data downloaded:
library(metasurvey)
library(magrittr)
path_dir <- file.path("example-data", "ech", "ech_2023")
ech_2023 <- load_panel_survey(
path_implantation = file.path(path_dir, "ECH_implantacion_2023.csv"),
path_follow_up = file.path(path_dir, "seguimiento"),
svy_type = "ECH_2023",
svy_weight_implantation = add_weight(annual = "W_ANO"),
svy_weight_follow_up = add_weight(
monthly = add_replicate(
"W",
replicate_path = file.path(
path_dir,
c(
"Pesos replicados Bootstrap mensuales enero_junio 2023",
"Pesos replicados Bootstrap mensuales julio_diciembre 2023"
),
c(
"Pesos replicados mensuales enero_junio 2023",
"Pesos replicados mensuales Julio_diciembre 2023"
)
),
replicate_id = c("ID" = "ID"),
replicate_pattern = "wr[0-9]+",
replicate_type = "bootstrap"
)
)
)
# Build labour market indicators
ech_2023 <- ech_2023 %>%
step_recode("pea", POBPCOAC %in% 2:5 ~ 1, .default = 0,
comment = "EAP", .level = "follow_up") %>%
step_recode("pet", e27 >= 14 ~ 1, .default = 0,
comment = "WAP", .level = "follow_up") %>%
step_recode("po", POBPCOAC == 2 ~ 1, .default = 0,
comment = "Employed", .level = "follow_up") %>%
step_recode("pd", POBPCOAC %in% 3:5 ~ 1, .default = 0,
comment = "Unemployed", .level = "follow_up")
ech_2023_bake <- bake_steps(ech_2023)
# Quarterly rates: activity, employment and unemployment
workflow_result <- workflow(
survey = extract_surveys(ech_2023_bake, quarterly = 1:4),
survey::svyratio(~pea, denominator = ~pet),
survey::svyratio(~po, denominator = ~pet),
survey::svyratio(~pd, denominator = ~pea),
estimation_type = "quarterly:monthly",
rho = 0.5,
R = 5 / 6
)
workflow_resultThis pipeline loads a rotating panel with bootstrap replicate weights, builds binary labour market indicators (EAP, WAP, employed, unemployed), and estimates activity, employment and unemployment rates by quarter with robust variance.
STATA transpiler
Many research groups maintain decades of STATA .do files that process household survey microdata. The metasurvey transpiler converts these scripts into reproducible Recipe objects.
library(metasurvey)
# Transpile a .do file to metasurvey steps
result <- transpile_stata("demographics.do")
result$steps[1:3]
#> [1] "step_rename(svy, hh_id = \"id\", person_id = \"nper\")"
#> [2] "step_compute(svy, weight_yr = pesoano)"
#> [3] "step_compute(svy, sex = e26)"
# Transpile an entire year directory into separate recipes
recipes <- transpile_stata_module(
year_dir = "do_files/2022",
year = 2022,
user = "research_team",
output_dir = "recipes/"
)
# Check coverage before migrating
transpile_coverage("do_files/")Supported STATA patterns: gen/replace chains, recode, egen with by-groups, foreach/forvalues loops, mvencode, destring, rename, drop/keep, variable and value labels, inrange/inlist expressions, and variable ranges.
See vignette("stata-transpiler") for the full reference.
Documentation
- Getting started
- International survey compatibility
- Recipes
- Workflows and estimation
- Complex designs
- Rotating panels
- Case study: ECH
- STATA transpiler
- ECH demographics recipe
- Interactive explorer
- API and database
- Self-hosting guide
Related work
| Package | Focus | metasurvey adds |
|---|---|---|
| survey | Sampling designs and estimation | Lazy step pipeline, recipe system, rotating panels |
| srvyr | dplyr-style interface to survey | Portable recipes, workflow registry, panel support |
| recipes | Feature engineering for modelling | Survey-aware steps, complex designs, community sharing |
| eph | Argentina’s EPH survey | Survey-agnostic: works with any household survey |
| targets | General pipeline orchestration | Domain-specific steps, built-in survey semantics |
metasurvey is not a wrapper around survey. It adds a reproducibility layer (steps, recipes, workflows) that is survey-agnostic: the same pipeline processes ECH, EPH, CASEN, PNAD-C, CPS, ENIGH, or DHS data without survey-specific code.
Citation
To cite metasurvey in publications use:
citation("metasurvey")Loprete M, da Silva N, Machado F (2026). metasurvey: Reproducible Survey Data Processing with Step Pipelines. R package, https://CRAN.R-project.org/package=metasurvey.
Contributing
Please see CONTRIBUTING.md for guidelines on how to contribute to metasurvey.
Code of Conduct
Please note that the metasurvey project is released with a Contributor Code of Conduct. By contributing to this project you agree to abide by its terms.