6  Conclusion

7 Conclusion

This report explored Variational Sparse Coding (VSC), an extension of Variational Autoencoders that introduces sparsity in the latent space via a Spike-and-Slab prior. Through theoretical analysis and empirical validation on MNIST, we demonstrated how VSC addresses key limitations of traditional VAEs and β-VAEs, particularly in achieving interpretable and controllable latent representations.

7.1 Key Contributions of VSC

7.1.1 Sparse Latent Representations

The Spike-and-Slab prior enforces sparsity by allowing latent dimensions to be exactly zero with high probability, unlike Gaussian priors in standard VAEs. This leads to disentangled features.

7.1.2 Dynamic Prior Adaptation

The prior is learned via pseudo-inputs and a classifier, avoiding the rigid assumptions of a fixed Gaussian prior. This enables the model to adapt to varying feature combinations across data points, a limitation of β-VAEs.

7.1.3 Warm-Up Strategy

The two-phase training (binary-like and continuous refinement) prevents posterior collapse, ensuring latent dimensions remain active and interpretable.

7.1.4 Sparsity Control

The KL sparsity term explicitly penalizes deviations from a target sparsity level, promoting efficient use of latent capacity.

7.2 Limitations

7.2.1 Discrete vs. Continuous Sparsity

The Spike-and-Slab prior assumes hard sparsity (exact zeros). For some tasks, soft sparsity (e.g., Laplace prior) may be more appropriate.

7.2.2 Reconstruction quality

VSC trades off some reconstruction fidelity for interpretability. Sparse priors can lead to blurrier outputs.

7.2.3 Parameter sensitivity

The warm-up schedule and coefficients as well as the sparsity target require careful tuning. The model remains unstable when adjusting the spike parameter (collapse for values ≠ 1.0), suggesting implementation-specific tuning is required.

7.3 Code Implementation

You can access our code implementations of Variational Sparse Coding (VSC) and other related models here.