top of page
Search

How does tSNE work?

  • Writer: Gamze Bulut
    Gamze Bulut
  • Mar 17
  • 2 min read

A Deep Dive into Stochastic Neighbor Embedding


Introduction


In high-dimensional data visualization, t-Distributed Stochastic Neighbor Embedding (t-SNE) has become one of the most popular techniques. It allows us to map complex datasets into two or three dimensions while preserving local structures. But how does it work? In this post, we will break down t-SNE step by step, drawing insights from its mathematical foundations and practical applications.


1. What is t-SNE Trying to Solve?


When we work with high-dimensional data (e.g., images, gene expression data, or word embeddings), understanding its structure becomes difficult. We need a way to: ✅ Reduce the data to 2D or 3D for visualization. ✅ Preserve clusters and local relationships. ✅ Avoid distortions caused by simple linear transformations (like PCA).


Unlike PCA, which is a linear method, t-SNE is non-linear and focuses on preserving local neighborhoods. This makes it ideal for detecting clusters and substructures in data.


2. What Does "Stochastic" Mean in t-SNE?


"Stochastic" means random but with structure. In t-SNE:

  • We start by randomly positioning the data points in low dimensions.

  • We then iteratively adjust their positions, using probabilities to match their high-dimensional relationships.

  • This randomness helps avoid bad solutions and ensures better local clustering.


Because of this, t-SNE’s output can vary slightly across runs—unlike PCA, which always gives the same result:

tSNE with seed 42
tSNE with seed 42
tSNE with seed 100
tSNE with seed 100

3. The Key Idea Behind t-SNE


At its core, t-SNE works by modeling similarities between points in both high-dimensional and low-dimensional spaces:


Step 1: Compute Pairwise Similarities in High-Dimensional Space

  • Each data point gets a probability score for how similar it is to other points.

  • This is done using a Gaussian distribution centered at each point.

  • Similar points have high probability, while distant points have low probability.


Step 2: Compute Pairwise Similarities in Low-Dimensional Space

  • Instead of a Gaussian, we use a t-distribution (which has heavier tails to avoid overcrowding).

  • The goal is to make sure that the low-dimensional probabilities match the high-dimensional ones as closely as possible.


Step 3: Minimize the Difference (KL Divergence)

  • The "error" function t-SNE minimizes is called Kullback-Leibler (KL) Divergence.

  • This function tells us how different two probability distributions are.

  • Using gradient descent, t-SNE moves points in 2D space until the low-dimensional structure resembles the high-dimensional one.


4. Why Does t-SNE Sometimes Look Different Every Time?


Since t-SNE is stochastic, each run may produce a slightly different layout. This happens because:

  1. The algorithm starts with a random initialization of points.

  2. The optimization process can get stuck in different local minima.

  3. Different perplexity values (a hyperparameter) can change how local/global structure is balanced.


🔹 Solution: Run t-SNE multiple times and look for stable patterns! You can also set a seed for reproducibility.

Comments


bottom of page