The otherwise straightforward analysis of randomized experiments is often complicated by the presence of missing data. In such situations it is necessary to make assumptions about the dependence of the selection mechanism on treatment, response, and covariates. The widely used approach of assuming that the data is missing at random conditional on treatment and other fully observed covariates is shown to be inadequate to describe data from a randomized experiment when partially observed covariates are also present.
This paper presents an alternative to the missing at random model (MAR) which is both consistent with the data and preserves the appeal of MAR. In particular, the proposed family of models minimize the discrepancy with MAR while explaining observed deviations. We apply this approach to data from the Restart job training program in the United Kingdom as well as an artificial data set. Evaluation of the Restart program is not affected by the assumption of MAR; both approaches suggest that the program increased the chances of exiting unemployment by around 9% within six months. However, analysis of the artificial data demonstrates that assuming MAR can easily lead to erroneous conclusions.