To credibly measure the effectiveness of public policies, when feasible, pilot programs with randomized control groups should be considered and rolled out first, to facilitate reliable program evaluations—before widespread implementation.
- We replicate and compare conclusions from experimental and observational policy design evaluations about social and informational messaging for water conservation, finding that following best practices does not always yield accurate results.
- Parallel pre-treatment trends and covariate balance are not sufficient indicators to validate untestable assumptions for fixed-effects panel data estimators.
- Expanding the sample of comparison units used in the analysis can increase the bias of the nonexperimental estimator even if the new comparison units are observably more similar to the treated group than the original comparison group.
We compare experimental and nonexperimental estimates from a social and informational messaging experiment. Our results show that applying a fixed effects estimator in conjunction with matching to pre-process nonexperimental comparison groups cannot replicate an experimental benchmark, despite parallel pre-intervention trends and good covariate balance. The results are a stark reminder about the role of untestable assumptions—in our case, conditional bias stability—in drawing causal inferences from observational data, and the dangers of relying on single studies to justify program scaling-up or cancelling.