Optimal Image Compression through Deep Learning Architecture
Krish Siddhiwala • May 2025
Visual overview of the compression and reconstruction output.
Overview
This research presents a lightweight deep learning model for image compression, designed
specifically for deployment in resource-constrained environments like mobile devices,
drones, and IoT systems. The proposed architecture integrates Convolutional Autoencoders
(CAEs), Residual Blocks (ResBlocks), and Generative Adversarial Networks (GANs) to
balance compression efficiency and perceptual quality. The model's components are
evaluated through an ablation study and benchmarked using standard metrics such as
Compression Ratio (CR), Peak Signal-to-Noise Ratio (PSNR), and Structural Similarity
Index Measure (SSIM). Overall, the results show that this compact model achieves
effective compression while maintaining visual fidelity, making it suitable for
edge-based real-world applications.
Architecture Overview
Diagram of the complete model architecture.
The model is a CAE trained within an adversarial framework using a PatchGAN discriminator.
The complete system consists of three main components: an encoder (Enc.), a decoder (Dec.),
and a discriminator (Disc). The encoder and decoder together form the generator, which aims
to form the reconstructed image. The discriminator promotes perceptual quality by
distinguishing real from fake reconstructions at the patch level. The diagram highlights how
input images are compressed, reconstructed, and evaluated across stages using adversarial
and perceptual feedback. This architecture allows the model to maintain spatial detail and
perceptual quality in reconstructions while remaining efficient and deployable for edge
applications.
Loss Functions
Training is driven by a hybrid loss function combining:
\( \mathcal{L}_G = \mathcal{L}_{L1} + \lambda_{LPIPS} \cdot \mathcal{L}_{LPIPS} + \lambda_{GAN} \cdot w \cdot \mathcal{L}_{GAN} \)
The discriminator is trained using hinge loss for greater stability:
\( L_D = \mathbb{E}[\max(0, 1 - D(x))] + \mathbb{E}[\max(0, 1 + D(\hat{x}))] \)
Results
Final Reconstruction Results of Model.
Qualitative results of full model reconstructions. Qualitative output of the full model. Each image shows the result of compression
and reconstruction through the model architecture. The reconstructions retain
global structure and local detail, demonstrating effective balance between compression
and fidelity.
Metrics
PSNR tracked during training of the three models. Shows gradual improvement in
pixel-level similarity between input and reconstructed images over training steps.
SSIM over training steps for the three models. Reflects improved perceptual
similarity between the input and reconstruction.
Ablation Study
This includes evaluations of 3 model configurations.
- CAE
- CAE + Residual Blocks [CRB]
- CAE + Residual Blocks + PatchGAN + Perceptual Loss [GCRB]
Ablation study presenting reconstruction results for three configurations: CAE, CRB, and GCRB.
The CAE model produces blurry reconstructions with noticeable checkerboard artifacts, especially in areas with rich textures. The CRB configuration enhances edge sharpness, improves pixel-level accuracy, and reduces artifacting. However, it still falls short in capturing fine details and textures, such as eye color, whiskers, and variations in fur. The GCRB model demonstrates further improvements, offering better preservation of detail and a more realistic overall appearance.