Overview of Autoencoders

Anay Dongre
6 min readJan 1, 2023

Autoencoders are a type of neural network that can be used to learn a compressed representation of a dataset. They consist of two main parts: an encoder, which maps the input data to a lower-dimensional representation, and a decoder, which maps the lower-dimensional representation back to the original dimensionality.

There are several different types of autoencoders, including:

  1. Vanilla autoencoder : This is the most basic type of autoencoder. It consists of a single hidden layer in the encoder and decoder, and is trained to reconstruct the input data as closely as possible.
    Architecture of Vanilla autoencoder is as follows :
    The architecture of a vanilla autoencoder consists of an encoder followed by a decoder. The encoder maps the input to a hidden representation, typically through a series of fully-connected (dense) layers. The decoder maps the hidden representation back to the original input space through a series of dense layers. The number of hidden units in the encoding layer is usually smaller than the number of input units, which allows the autoencoder to learn a compressed representation of the input. The encoder and decoder weights are typically initialized randomly and are trained to minimize a reconstruction loss, which measures the difference between the input and the reconstructed output.

Here is an example of the architecture of a vanilla autoencoder with a single hidden layer:

Input layer (m input units) -> Encoding layer (n hidden units) -> Decoding layer (m output units)
where m is the number of input units and n is the number of hidden units. The number of hidden units can be chosen based on the desired level of compression. The output of the decoder is used as the reconstructed input. The reconstruction loss is typically measured using the mean squared error between the input and the reconstructed output.

2. Convolutional autoencoder : A convolutional autoencoder is a type of autoencoder that uses convolutional layers in the encoder and decoder. Like a vanilla autoencoder, a convolutional autoencoder is trained to reconstruct its input, but it is particularly well-suited to learning hierarchical representations of data. Convolutional autoencoders are commonly used for image data, where they can learn to extract features such as edges, shapes, and textures from the input image.
The architecture of a convolutional autoencoder consists of an encoder followed by a decoder. The encoder maps the input to a hidden representation through a series of convolutional and pooling layers. The decoder maps the hidden representation back to the original input space through a series of transposed convolutional layers. The number of channels in the encoding layer is usually smaller than the number of channels in the input layer, which allows the autoencoder to learn a compressed representation of the input. The encoder and decoder weights are typically initialized randomly and are trained to minimize a reconstruction loss, which measures the difference between the input and the reconstructed output.
Here is an example of the architecture of a convolutional autoencoder with a single encoding layer:
Input layer (m x n x c channels) -> Convolutional layer -> Pooling layer -> Transposed convolutional layer -> Output layer (m x n x c channels)
where m and n are the dimensions of the input and c is the number of channels. The output of the decoder is used as the reconstructed input. The reconstruction loss is typically measured using the mean squared error between the input and the reconstructed output.

Convolutional Autoencoder Architecture
Convolutional Autoencoder Architecture

3) Denoising autoencoder : A denoising autoencoder is a type of autoencoder that is trained to reconstruct a corrupted version of its input. The goal of training is to learn a transformation that removes the corruption from the input and reconstructs the original, uncorrupted input. Denoising autoencoders can be useful for tasks such as image denoising, where the goal is to remove noise from an image.
The architecture of a denoising autoencoder is similar to that of a vanilla autoencoder, consisting of an encoder followed by a decoder. However, during training, the input to the autoencoder is first corrupted by adding noise to the original input. The autoencoder is then trained to reconstruct the original, uncorrupted input from the corrupted version. The encoder and decoder weights are initialized randomly and are trained to minimize a reconstruction loss, which measures the difference between the original input and the reconstructed output.
Here is an example of the architecture of a denoising autoencoder with a single hidden layer:
Input layer (corrupted version of original input) -> Encoding layer (n hidden units) -> Decoding layer (reconstructed output)
where n is the number of hidden units. The reconstruction loss is typically measured using the mean squared error between the original input and the reconstructed output.

Denoising Autoencoder Architecture

4. Variational autoencoder (VAE) : A variational autoencoder (VAE) is a type of generative model that is based on the principle of variational inference. It consists of an encoder, which maps the input data to a latent space, and a decoder, which maps points in the latent space back to the input space. The goal of training is to learn the encoder and decoder weights such that the decoder can generate new, synthetic data samples that are similar to the original input data.
VAEs have two main components: an encoder network and a decoder network. The encoder maps the input data to a latent space, typically through a series of fully-connected (dense) layers. The latent space is usually a lower-dimensional space than the input space, which allows the VAE to learn a compact representation of the input data. The decoder maps points in the latent space back to the input space through a series of dense layers.
During training, the VAE maximizes the likelihood of the input data under the model, while simultaneously trying to enforce a prior distribution over the latent space. This is done by minimizing the reconstruction loss, which measures the difference between the input data and the reconstructed output, and the KL divergence between the latent space distribution and the prior distribution. The KL divergence is a measure of the difference between two probability distributions.
Here is an example of the architecture of a VAE with a single hidden layer in the encoder and decoder:
Input layer (m input units) -> Encoding layer (n hidden units) -> Latent space (n latent units) -> Decoding layer (n hidden units) -> Output layer (m output units)
where m is the number of input units, n is the number of hidden units, and n is the number of latent units. The reconstruction loss is typically measured using the mean squared error between the input and the reconstructed output. The KL divergence between the latent space distribution and the prior distribution is typically measured using the closed-form solution for the Gaussian case.

Variational Autoencoder Architecture

Autoencoders can be used for a variety of tasks, including:

  1. Data compression: Autoencoders can be used to compress data by learning a lower-dimensional representation of the input data and using this representation to reconstruct the original data.
  2. Dimensionality reduction: Autoencoders can be used to reduce the number of dimensions in a dataset, which can be useful for visualizing high-dimensional data or for improving the performance of machine learning models.
  3. Feature extraction: By training an autoencoder to reconstruct the input data, it can learn to extract meaningful features from the data. These features can then be used as input to another machine learning model.
  4. Anomaly detection: Autoencoders can be used to detect anomalies in a dataset by reconstructing the input data and computing the reconstruction error. If the reconstruction error is above a certain threshold, it may indicate that the input data is anomalous.
  5. Generative modeling: Autoencoders can be used as building blocks for more complex generative models, such as Variational Autoencoders (VAEs) and Generative Adversarial Networks (GANs). These models are capable of generating new data that is similar to the training data.
  6. Representation learning: Autoencoders can be used to learn meaningful representations of the input data, which can be used for tasks such as classification or clustering.

References :

  1. https://iq.opengenus.org/content/images/2022/04/denoising_autoencoder.png
  2. https://www.researchgate.net/publication/339743465_Galaxy_Image_Classification_Based_on_Citizen_Science_Data_A_Comparative_Study
  3. https://www.researchgate.net/publication/359471754_Meta-Learning_Fast_Adaptation_and_Latent_Representation_for_Head_Pose_Estimation/figures?lo=1

--

--