Description
In real-world problems, the class balance between the test and training dataset often differ, which may cause an estimation bias. The class balance of the test dataset may be estimated in a semi-supervised setup using unlabeled data from the test dataset and labeled data from the training dataset. The method provided below performs this estimation by matching the distributions under the Pearson divergence.
Downloads
MATLAB implementation of the Pearson divergence method: [ClassPriorPearson.zip]
Example
The files included in the archive is:
- pe_prior_est_grid.m is the main function that estimates the class prior via grid search.
- demo.mis a demo script.
- compMedDist.m is an auxiliary function.
The class-prior can be estimated from the labeled training and unlabeled test data as:
% calculate the class prior
[theta_est, theta_list, PD] = pe_prior_est_grid(xtr, ytr, xte);
Example results for a problem with a true class prior of is given below.
Training distributions and samples:
Test distribution and samples:
Estimated Pearson divergence and estimated class prior:
References
- du Plessis, M. C. & Sugiyama, M.
Semi-supervised learning of class balance under class-prior change by distribution matching.
In J. Langford and J. Pineau (Eds.), Proceedings of 29th International Conference on Machine Learning (ICML2012), pp.823-830, Edinburgh, Scotland, Jun. 26-Jul. 1, 2012.
[Paper]
- du Plessis, M. C. & Sugiyama, M.
Semi-supervised learning of class balance under class-prior change by distribution matching.
Neural Networks, vol.50, pp.110-119, 2014.
[Paper]