Class-prior estimation via L2-divergence minimization

Description

In real-world problems, the class balance between the test and training dataset often differ, which may cause an estimation bias. The class balance of the test dataset may be estimated in a semi-supervised setup using unlabeled data from the test dataset and labeled data from the training dataset. The method provided below performs this estimation by matching the distributions under the L_2 distance.

Downloads

A MATLAB implementation and a toy example is given here:

  • toyexample.m a toy example
  • LSDDPriorEstMedian.m is the function that estimates the class prior. The list of hyper-parameters is centered around the median distance of the data.

Example Usage

To estimate the class-prior from the labeled training data and unlabeled test data, the function LSDDPriorEstMedian.m can be invoked as:

    % estimate the class prior
    [xi_best, xi_list, LSDE] = LSDDPriorEstMedian(xte, {x1, x2});

In the figures below, the true class prior is p_{te}(y=1) = 0.3. The labeled and unlabeled samples were:

ClassPriorL2Data
The estimated L_2-distance and estimated class prior were:
ClassPriorL2Result

References

  • Sugiyama, M., Suzuki, T., Kanamori, T., du Plessis, M. C., Liu, S., & Takeuchi, I.
    Density-difference estimation.
    Neural Computation, vol.25, no.10, pp.2734-2775, 2013.
    [Paper]
  • Sugiyama, M., Suzuki, T., Kanamori, T., du Plessis, M. C., Liu, S., & Takeuchi, I.
    Density-difference estimation.
    In P. Bartlett, F. C. N. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger (Eds.), Advances in Neural Information Processing Systems 25, pp.692-700, 2012. (Presented at Neural Information Processing Systems (NIPS2012), Lake Tahoe, Nevada, USA, Dec. 3-6, 2012)
    [Paper]