Problem formulation
In this problem, we have a fully labeled training dataset:
and an unlabeled dataset drawn according to:
We assume that the test and training datasets only differ by a change in class priors:
The goal is then to get an estimate , that would allow us to reweight any empirical average
calculated using the training samples:
The general strategy that we follow is to have the following model of the test input density:
We then select so that the model
is the same as
. To compare
and
, we
use a divergence (such as an -divergence or
-distance). These can in turn be directly estimated from samples (avoiding density estimation).
We provide implementations for two methods:
Note that experimentally, the -method seems to give the best results.