top of page
Search

Famous Swiss Roll example

  • Writer: Gamze Bulut
    Gamze Bulut
  • Mar 17
  • 3 min read

Understanding Dimension Reduction: PCA vs. t-SNE vs. UMAP with the Swiss Roll


When dealing with high-dimensional data, it is often challenging to visualize and interpret patterns. Dimensionality reduction helps us project complex datasets into fewer dimensions while preserving meaningful structure. In this post, we will explore two popular dimensionality reduction techniques—Principal Component Analysis (PCA) and t-Distributed Stochastic Neighbor Embedding (t-SNE)—using a classic example: the Swiss Roll dataset.


The Swiss Roll Dataset


The Swiss Roll is a synthetic dataset where points are arranged on a curved 3D surface, forming a spiral-like shape. The challenge in dimensionality reduction is to flatten this 3D spiral while preserving important relationships between points.


In this figure, each point is color-coded based on its position along the spiral. Ideally, after dimensionality reduction, points with similar colors should remain close together.


PCA: Preserving Global Structure


Principal Component Analysis (PCA) is a linear dimensionality reduction method. It works by identifying the directions (principal components) that capture the most variance in the data and projecting the dataset onto these components.


When PCA is applied to the Swiss Roll, we obtain the following projection:


Key Observations:


✅ PCA successfully collapses the Swiss Roll into two dimensions. ✅ It preserves the global structure, meaning the overall spiral shape is still visible. 🚨 However, PCA distorts local relationships—some points that were far apart in 3D may appear close in 2D.

📌 PCA is useful when the data follows a linear structure and we need a fast, interpretable projection.


t-SNE: Preserving Local Structure


t-Distributed Stochastic Neighbor Embedding (t-SNE) is a non-linear dimensionality reduction technique that focuses on preserving local relationships in the data. Instead of a linear projection, t-SNE maps points into 2D space while trying to maintain the pairwise similarities between points.


Applying t-SNE to the Swiss Roll results in:

tSNE projection
tSNE projection

Key Observations:


✅ t-SNE successfully unrolls the Swiss Roll by grouping similar points together. ✅ It preserves local neighborhoods—points that were close in 3D remain close in 2D. 🚨 However, t-SNE does not preserve the global spiral shape; instead, it breaks the roll into clusters.


📌 t-SNE is great for discovering clusters in high-dimensional data but does not maintain global structure.


Comparison: PCA vs. t-SNE

Feature

PCA (Linear)

t-SNE (Non-Linear)

Preserves Global Structure?

✅ Yes

❌ No

Preserves Local Structure?

❌ No

✅ Yes

Works Well for Curved Data?

❌ No

✅ Yes

Good for Cluster Discovery?

❌ No

✅ Yes

Computationally Efficient?

✅ Fast

❌ Slow

  • If you need a quick, structured projection, PCA is a great choice.

  • If you need to find clusters and preserve neighborhood relationships, t-SNE is better.


Both PCA and t-SNE are powerful tools for reducing dimensionality, but they serve different purposes. PCA is best when global structure matters, while t-SNE is useful for local clustering. Understanding these differences is key when visualizing and interpreting high-dimensional datasets.


I am still debating, if I understood the whole concept. But this is essentially it. :)


What about UMAP?


At this point, I questioned, don't we need an algorithm that can preserve the spiral shape better? The answer is UMAP.


We can argue this is still not the perfect spiral shape but better than the tSNE's broken apart pieces.


UMAP Prioritizes Local Structure

  • UMAP is designed to preserve local distances more than global distances.

  • It maintained the order of the Swiss Roll but didn’t force it into a perfectly smooth spiral.


Why UMAP Might Be the Best Choice for You

Feature

PCA

t-SNE

UMAP

Preserves Global Structure?

✅ Yes

❌ No

✅ Yes

Preserves Local Structure?

❌ No

✅ Yes

✅ Yes

Distributes Points Naturally?

❌ No (line)

❌ No (overclusters)

✅ Yes (balanced)

Fast for Large Data?

✅ Yes

❌ No (slow)

✅ Yes (faster)

Have you used any of these algorithms? How was your experience?

Comments


bottom of page