Class-prior estimation via Pearson divergence minimization

Description

In real-world problems, the class balance between the test and training dataset often differ, which may cause an estimation bias. The class balance of the test dataset may be estimated in a semi-supervised setup using unlabeled data from the test dataset and labeled data from the training dataset. The method provided below performs this estimation by matching the distributions under the Pearson divergence.

Downloads

MATLAB implementation of the Pearson divergence method: [ClassPriorPearson.zip]

Example

The files included in the archive is:

pe_prior_est_grid.m is the main function that estimates the class prior via grid search.
demo.mis a demo script.
compMedDist.m is an auxiliary function.

The class-prior can be estimated from the labeled training and unlabeled test data as:

     % calculate the class prior
    [theta_est, theta_list, PD] = pe_prior_est_grid(xtr, ytr, xte);

Example results for a problem with a true class prior of $p_{te}(y=1) = 0.8$ is given below.
$p_{te} (y = 1) = 0.8$ Training distributions and samples:

Test distribution and samples:

Estimated Pearson divergence and estimated class prior:

References

du Plessis, M. C. & Sugiyama, M.
Semi-supervised learning of class balance under class-prior change by distribution matching.
In J. Langford and J. Pineau (Eds.), Proceedings of 29th International Conference on Machine Learning (ICML2012), pp.823-830, Edinburgh, Scotland, Jun. 26-Jul. 1, 2012.
[Paper]

du Plessis, M. C. & Sugiyama, M.
Semi-supervised learning of class balance under class-prior change by distribution matching.
Neural Networks, vol.50, pp.110-119, 2014.
[Paper]