Optimal Image Compression

Optimal Image Compression through Deep Learning Architecture

Krish Siddhiwala • May 2025

Visual overview of the compression and reconstruction output.

Overview

This research presents a lightweight deep learning model for image compression, designed specifically for deployment in resource-constrained environments like mobile devices, drones, and IoT systems. The proposed architecture integrates Convolutional Autoencoders (CAEs), Residual Blocks (ResBlocks), and Generative Adversarial Networks (GANs) to balance compression efficiency and perceptual quality. The model's components are evaluated through an ablation study and benchmarked using standard metrics such as Compression Ratio (CR), Peak Signal-to-Noise Ratio (PSNR), and Structural Similarity Index Measure (SSIM). Overall, the results show that this compact model achieves effective compression while maintaining visual fidelity, making it suitable for edge-based real-world applications.

Architecture Overview

Diagram of the complete model architecture.

The model is a CAE trained within an adversarial framework using a PatchGAN discriminator. The complete system consists of three main components: an encoder (Enc.), a decoder (Dec.), and a discriminator (Disc). The encoder and decoder together form the generator, which aims to form the reconstructed image. The discriminator promotes perceptual quality by distinguishing real from fake reconstructions at the patch level. The diagram highlights how input images are compressed, reconstructed, and evaluated across stages using adversarial and perceptual feedback. This architecture allows the model to maintain spatial detail and perceptual quality in reconstructions while remaining efficient and deployable for edge applications.

Loss Functions

Training is driven by a hybrid loss function combining:

\( \mathcal{L}_G = \mathcal{L}_{L1} + \lambda_{LPIPS} \cdot \mathcal{L}_{LPIPS} + \lambda_{GAN} \cdot w \cdot \mathcal{L}_{GAN} \)

The discriminator is trained using hinge loss for greater stability:

\( L_D = \mathbb{E}[\max(0, 1 - D(x))] + \mathbb{E}[\max(0, 1 + D(\hat{x}))] \)

Results

Final Reconstruction Results of Model.

Qualitative results of full model reconstructions. Qualitative output of the full model. Each image shows the result of compression and reconstruction through the model architecture. The reconstructions retain global structure and local detail, demonstrating effective balance between compression and fidelity.

Metrics

PSNR tracked during training of the three models. Shows gradual improvement in pixel-level similarity between input and reconstructed images over training steps.

SSIM over training steps for the three models. Reflects improved perceptual similarity between the input and reconstruction.

Ablation Study

This includes evaluations of 3 model configurations.

CAE
CAE + Residual Blocks [CRB]
CAE + Residual Blocks + PatchGAN + Perceptual Loss [GCRB]

Ablation study presenting reconstruction results for three configurations: CAE, CRB, and GCRB.

The CAE model produces blurry reconstructions with noticeable checkerboard artifacts, especially in areas with rich textures. The CRB configuration enhances edge sharpness, improves pixel-level accuracy, and reduces artifacting. However, it still falls short in capturing fine details and textures, such as eye color, whiskers, and variations in fur. The GCRB model demonstrates further improvements, offering better preservation of detail and a more realistic overall appearance.