Nonlinear Dimension Reduction
1. UMAP
- McInnes, Leland, John Healy, and James Melville. "Umap: Uniform manifold approximation and projection for dimension reduction." arXiv preprint arXiv:1802.03426 (2018).
- UMAP Python Library Document
- Leland McInnes - UMAP Talk
- How Exactly UMAP Works and why exactly it is better than t-SNE
2. Details
- Assumption: Data is uniformly distributed on the manifold.
- Define Riemannian metric on manifold to make the assumption true.
- Assumption: The manifold is locally connected.
Graph with combined edge weights
t-SNE preserves only local structure (KL divergence) while UMAP can preserve global structure (cross entropy)
- UMAP find approximate nearest neighbours very efficiently (even in high-d space)
- Random-projection tree + nearest neighbour descent
- SGD + negative sampling (avoid the curse of dimensionality and can do dimension reduction over 3-d, unlike t-SNE)
- Use Python + Numba (custom distance metrics as long as it can be compiled by JIT)
- Can make use of labels for supervised dimension reduction
- combine different distance (categorical data, discrete data, etc.)