Confidence Intervals for Policy Evaluation in Adaptive Experiments
1. Summary
For adaptively collected data in a potential outcome model, naive average results in a biased estimation, and the inverse-probability weighting estimator results in a non-normal asymptotic distribution. This paper propose a test statistic that is asymptotically unbiased and normal.
2. Details
Denote our data as , The random variables are called the arms, treatments or interventions. Arms are categorical. The reward or outcome represents the individual's response to the treatment. The treatment assignment probabilities , also called propensity scores, are time-varying and decided via some known algorithm. Given this setup, we are concerned with the problem of estimating and testing pre-specified hypotheses about the value of an arm, denoted by , as well as differences between two such values, denoted by .
IPW estimator is defined as
Inverse-probability weighting fixes the bias problem but results in a non-normal asymptotic distribution. In fact, when the probability of assignment to the arm of interest tends to zero, the inverse probability weights increase, which in turn causes the tails of the distribution to become heavier.
This paper considers the adaptively-weighted AIPW estimator:
Suppose that either is consistent or has a limit , i.e., either