5 Experiments

In this section, we validate the theoretical insights through practical experiments.

We implemented and trained from scratch four models on the MNIST dataset:

AutoEncoder (AE)
Variational AutoEncoder (VAE)
Variational Sparse Coding (VSC)
Variational Sparse Coding with Warm-Up (VSC-WU)

All models share the same encoder/decoder structure: three convolutional layers with ReLU activations, ensuring a fair comparison by keeping the number of parameters approximately equal. Each model was trained using the Adam optimizer over 25 epochs.

5.1 Latent Space

5.1.1 Visualization

To qualitatively evaluate the structure of the latent space, we sample from a 2D grid of latent vectors and each point is decoded to an image

This provides insight into how the model organizes information in the latent space.

AE Latent Space VAE Latent Space VSC Latent Space VSC-WU Latent Space

5.1.2 Interpretation

AE: Exhibits minimal structure, with random digit distribution and no discernible clustering.

VAE: Better organized and gradual transitions between digits.

VSC: Should reveal discrete clusters for each digit class but we don’t see distinct boundaries like expected with the sparse coding framework.

VSC-WU: Should show improved transition smoothness between clusters.

5.2 Activated Dimensions

5.2.1 Visualization

We visualize how each latent dimension is activated across the dataset categories. For each class (0 to 9), we compute the mean latent vector and display it as a heatmap.

This helps us identify how interpretable and sparse the representations are.

AE Activated Dimensions VAE Activated Dimensions VSC Activated Dimensions VSC-WU Activated Dimensions

5.2.2 Interpretation

AE: Shows dense activations across all dimensions, indicating non-selective feature encoding.

VAE: More balanced but still overlapping activations. No clear sparsity pattern.

VSC: Demonstrates clear sparsity. The two dimensions remain near zero, and not the same are selectively active depending on digits. This aligns with the spike-and-slab prior’s goal of feature disentanglement. More latent dimensions could help demonstrate posterior collapse. Note: During our experiments, we didn’t find the authors’ tuning that prevented collapse with spike values ≠ 1.0

VSC-WU: Same as for VSC but with lower and more homogeonous values. More latent dimensions could help demonstrate that the warp-up strategy helps to avoid posterior collapse but it is not obvious here.