Semi-supervised Learning

Semi-supervised learning considers the problem (especially in classification) when only a small subset of the empirical data have labels. The question is: can we utilize the information in the unlabeled data to improve the classification. In recent year deep learning literatures, there are typically two ways to do semi-supervised learning: modification on generative models and few shot learning.

1. Semi-supervised Learning with Variational Autoencoder

Variational autoencoder (VAE) considers the following generative process:

where can be modeled using a deep neural network. The inference model on the contrary, has the following form:

Therefore, the evidence lower bound (ELBO) for the likelihood function is

Instead of optimize the likelihood function directly, we can optimize the lower bound of likelihood function. Notice that VAE can be seen as a combination of a encoder and a decoder, where is the reconstruction error for decoder and the information compression quality for encoder.

In Gaussian case, the KL term can be computed analytically. The log-likelihood term can not be solved analytically, but with re-parameterization trick, we can rewrite it as

where indicates the element-wise product. Therefore, the gradient can be expressed as

which can be evaluated analytically.

What can we do with VAE under semi-supervised settings

  • In semi-supervised learning, one way to combine the generative model is training a VAE using all the data (labeled and unlabeled ) and then train a classifier with labels from and latent representation of from inference model .

  • One can modify the generative model to include class label . We then need to consider two case: for , the ELBO is the same with the normal VAE, with ; for , class label should be integrated out using in ELBO. We can add a classification loss such that the distribution also learns from labeled data.

  • In Kingma, et al. (2014), the above model has been combined therefore the latter one does not infer from features directly but from its latent representation from the previous VAE instead.

  • Other variations include a ladder structure (Rasmus, et al., 2015) where the encoding and decoding is happening on each layer, as well as auxiliary generative model where it adds a auxiliary variable to the model, which make the variational distribution (inference model) more expressive.

2. Semi-supervised Learning with GANs

Generative adversarial networks (GANs) are a class of methods for learning generative models based on game theory. The goal of GANs is to train a generator network that produces samples from the data distribution, , by transforming vectors of noise as . The training signal for is provided by a discriminator network that is trained to distinguish samples from the generator distribution from real data. The generator network in turn is then trained to fool the discriminator into accepting its outputs as being real.

A typical loss function form for GAN is

The simple way to combine GAN with semi-supervised learning is to turn the output size of discriminator from scaler to a -dimension vector, where responses to the class number . The idea to do semi-supervised learning here is simply adding samples from the GAN generator to our data set, labeling them with a new “generated” class . Therefore, we can define a corresponding loss function for this semi-supervised GAN as where

Some training techniques is summarized below

1. feature matching: Salimans et al. (2016) finds that feature matching loss for generator works well empirically:

where denotes some activations on an intermediate layer of the discriminator.

2. manifold regularization: using Laplacian norm to encourage local invariance and hence smoothness of a classifier on the data manifold . The Laplacian norm can be approximated and therefore, we introduce a regularization term in

where .

results matching ""

    No results matching ""