Creates a non-mutating step that checks data invariants when
bake_steps is called. Each check is a logical expression
evaluated row-wise against the survey data. If any row fails a check,
the pipeline stops (or warns).
Usage
step_validate(
svy,
...,
.action = c("stop", "warn"),
.min_n = NULL,
.copy = use_copy_default(),
comment = "Validate step"
)Arguments
- svy
A Survey or RotativePanelSurvey object
- ...
Logical expressions evaluated against the data. Each must return a logical vector with one value per row. Named expressions use the name in error messages; unnamed expressions use the deparsed code. Examples:
income > 0,!is.na(age),sex %in% c(1, 2).- .action
What to do when a check fails:
"stop"(default) raises an error,"warn"issues a warning and continues.- .min_n
Minimum number of rows required. Checked before row-level expressions.
- .copy
Whether to operate on a copy (default:
use_copy_default())- comment
Descriptive text for the step for documentation and traceability (default
"Validate step").
Details
Lazy evaluation (default): Like all steps, validation checks are
recorded but not executed until bake_steps is called.
This means step_validate can reference variables created by
preceding step_compute calls.
The validate step does not modify the data in any way. It only inspects the current state of the data.table and raises an error or warning if any check fails.
See also
Other steps:
bake_steps(),
get_steps(),
step_compute(),
step_filter(),
step_join(),
step_recode(),
step_remove(),
step_rename(),
view_graph()
Examples
dt <- data.table::data.table(
id = 1:5, age = c(25, 30, 45, 50, 60),
income = c(1000, 2000, 3000, 4000, 5000), w = 1
)
svy <- Survey$new(
data = dt, edition = "2023", type = "test",
psu = NULL, engine = "data.table", weight = add_weight(annual = "w")
)
# Validate that all ages are positive and income is not NA
svy <- svy |>
step_validate(age > 0, !is.na(income), .min_n = 3) |>
bake_steps()