Famous Swiss Roll example

Understanding Dimension Reduction: PCA vs. t-SNE vs. UMAP with the Swiss Roll

When dealing with high-dimensional data, it is often challenging to visualize and interpret patterns. Dimensionality reduction helps us project complex datasets into fewer dimensions while preserving meaningful structure. In this post, we will explore two popular dimensionality reduction techniques—Principal Component Analysis (PCA) and t-Distributed Stochastic Neighbor Embedding (t-SNE)—using a classic example: the Swiss Roll dataset.

The Swiss Roll Dataset

The Swiss Roll is a synthetic dataset where points are arranged on a curved 3D surface, forming a spiral-like shape. The challenge in dimensionality reduction is to flatten this 3D spiral while preserving important relationships between points.

In this figure, each point is color-coded based on its position along the spiral. Ideally, after dimensionality reduction, points with similar colors should remain close together.

PCA: Preserving Global Structure

Principal Component Analysis (PCA) is a linear dimensionality reduction method. It works by identifying the directions (principal components) that capture the most variance in the data and projecting the dataset onto these components.

When PCA is applied to the Swiss Roll, we obtain the following projection:

Key Observations:

✅ PCA successfully collapses the Swiss Roll into two dimensions. ✅ It preserves the global structure, meaning the overall spiral shape is still visible. 🚨 However, PCA distorts local relationships—some points that were far apart in 3D may appear close in 2D.

📌 PCA is useful when the data follows a linear structure and we need a fast, interpretable projection.

t-SNE: Preserving Local Structure

t-Distributed Stochastic Neighbor Embedding (t-SNE) is a non-linear dimensionality reduction technique that focuses on preserving local relationships in the data. Instead of a linear projection, t-SNE maps points into 2D space while trying to maintain the pairwise similarities between points.

Applying t-SNE to the Swiss Roll results in:

Key Observations:

✅ t-SNE successfully unrolls the Swiss Roll by grouping similar points together. ✅ It preserves local neighborhoods—points that were close in 3D remain close in 2D. 🚨 However, t-SNE does not preserve the global spiral shape; instead, it breaks the roll into clusters.

📌 t-SNE is great for discovering clusters in high-dimensional data but does not maintain global structure.

Comparison: PCA vs. t-SNE

Feature	PCA (Linear)	t-SNE (Non-Linear)
Preserves Global Structure?	✅ Yes	❌ No
Preserves Local Structure?	❌ No	✅ Yes
Works Well for Curved Data?	❌ No	✅ Yes
Good for Cluster Discovery?	❌ No	✅ Yes
Computationally Efficient?	✅ Fast	❌ Slow

If you need a quick, structured projection, PCA is a great choice.
If you need to find clusters and preserve neighborhood relationships, t-SNE is better.

Both PCA and t-SNE are powerful tools for reducing dimensionality, but they serve different purposes. PCA is best when global structure matters, while t-SNE is useful for local clustering. Understanding these differences is key when visualizing and interpreting high-dimensional datasets.

I am still debating, if I understood the whole concept. But this is essentially it. :)

What about UMAP?

At this point, I questioned, don't we need an algorithm that can preserve the spiral shape better? The answer is UMAP.

We can argue this is still not the perfect spiral shape but better than the tSNE's broken apart pieces.

UMAP Prioritizes Local Structure

UMAP is designed to preserve local distances more than global distances.
It maintained the order of the Swiss Roll but didn’t force it into a perfectly smooth spiral.

Why UMAP Might Be the Best Choice for You

Feature	PCA	t-SNE	UMAP
Preserves Global Structure?	✅ Yes	❌ No	✅ Yes
Preserves Local Structure?	❌ No	✅ Yes	✅ Yes
Distributes Points Naturally?	❌ No (line)	❌ No (overclusters)	✅ Yes (balanced)
Fast for Large Data?	✅ Yes	❌ No (slow)	✅ Yes (faster)

Have you used any of these algorithms? How was your experience?

Famous Swiss Roll example

The Swiss Roll Dataset

PCA: Preserving Global Structure

Key Observations:

t-SNE: Preserving Local Structure

Key Observations:

Comparison: PCA vs. t-SNE

Why UMAP Might Be the Best Choice for You

Recent Posts

Comments