Optimal Image Compression through Deep Learning Architecture

Krish Siddhiwala • May 2025

Architecture Overview

Model Diagram
Diagram of the complete model architecture.

The model is a CAE trained within an adversarial framework using a PatchGAN discriminator. The complete system consists of three main components: an encoder (Enc.), a decoder (Dec.), and a discriminator (Disc). The encoder and decoder together form the generator, which aims to form the reconstructed image. The discriminator promotes perceptual quality by distinguishing real from fake reconstructions at the patch level. The diagram highlights how input images are compressed, reconstructed, and evaluated across stages using adversarial and perceptual feedback. This architecture allows the model to maintain spatial detail and perceptual quality in reconstructions while remaining efficient and deployable for edge applications.

Loss Functions

Training is driven by a hybrid loss function combining:
\( \mathcal{L}_G = \mathcal{L}_{L1} + \lambda_{LPIPS} \cdot \mathcal{L}_{LPIPS} + \lambda_{GAN} \cdot w \cdot \mathcal{L}_{GAN} \)
The discriminator is trained using hinge loss for greater stability:
\( L_D = \mathbb{E}[\max(0, 1 - D(x))] + \mathbb{E}[\max(0, 1 + D(\hat{x}))] \)

Results

Final reconstructions
Final Reconstruction Results of Model.

Qualitative results of full model reconstructions. Qualitative output of the full model. Each image shows the result of compression and reconstruction through the model architecture. The reconstructions retain global structure and local detail, demonstrating effective balance between compression and fidelity.

Metrics

PSNR chart
PSNR tracked during training of the three models. Shows gradual improvement in pixel-level similarity between input and reconstructed images over training steps.
SSIM chart
SSIM over training steps for the three models. Reflects improved perceptual similarity between the input and reconstruction.

Ablation Study

This includes evaluations of 3 model configurations.
  • CAE
  • CAE + Residual Blocks [CRB]
  • CAE + Residual Blocks + PatchGAN + Perceptual Loss [GCRB]
Ablation reconstruction comparison
Ablation study presenting reconstruction results for three configurations: CAE, CRB, and GCRB.

The CAE model produces blurry reconstructions with noticeable checkerboard artifacts, especially in areas with rich textures. The CRB configuration enhances edge sharpness, improves pixel-level accuracy, and reduces artifacting. However, it still falls short in capturing fine details and textures, such as eye color, whiskers, and variations in fur. The GCRB model demonstrates further improvements, offering better preservation of detail and a more realistic overall appearance.