Class-prior estimation via Pearson divergence minimization

Description

In real-world problems, the class balance between the test and training dataset often differ, which may cause an estimation bias. The class balance of the test dataset may be estimated in a semi-supervised setup using unlabeled data from the test dataset and labeled data from the training dataset. The method provided below performs this estimation by matching the distributions under the Pearson divergence.

Downloads

MATLAB implementation of the Pearson divergence method: [ClassPriorPearson.zip]

Example

The files included in the archive is:

The class-prior can be estimated from the labeled training and unlabeled test data as:

     % calculate the class prior
    [theta_est, theta_list, PD] = pe_prior_est_grid(xtr, ytr, xte);

Example results for a problem with a true class prior of p_{te}(y=1) = 0.8 is given below.
Training distributions and samples:
ClassPriorPearsonTraining
Test distribution and samples:
ClassPriorPearsonTest
Estimated Pearson divergence and estimated class prior:
ClassPriorPearsonPE

References

  • du Plessis, M. C. & Sugiyama, M.
    Semi-supervised learning of class balance under class-prior change by distribution matching.
    In J. Langford and J. Pineau (Eds.), Proceedings of 29th International Conference on Machine Learning (ICML2012), pp.823-830, Edinburgh, Scotland, Jun. 26-Jul. 1, 2012.
    [Paper]
  • du Plessis, M. C. & Sugiyama, M.
    Semi-supervised learning of class balance under class-prior change by distribution matching.
    Neural Networks, vol.50, pp.110-119, 2014.
    [Paper]