Nonlinear Dimension Reduction

1. UMAP

Assumption: Data is uniformly distributed on the manifold.
- Define Riemannian metric on manifold to make the assumption true.
Assumption: The manifold is locally connected.
Graph with combined edge weights
t-SNE preserves only local structure (KL divergence) while UMAP can preserve global structure (cross entropy)
UMAP find approximate nearest neighbours very efficiently (even in high-d space)
- Random-projection tree + nearest neighbour descent
- SGD + negative sampling (avoid the curse of dimensionality and can do dimension reduction over 3-d, unlike t-SNE)
Use Python + Numba (custom distance metrics as long as it can be compiled by JIT)
Can make use of labels for supervised dimension reduction
combine different distance (categorical data, discrete data, etc.)