Nonlinear Dimension Reduction

1. UMAP

2. Details

  • Assumption: Data is uniformly distributed on the manifold.
    • Define Riemannian metric on manifold to make the assumption true.
  • Assumption: The manifold is locally connected.
  • Graph with combined edge weights

  • t-SNE preserves only local structure (KL divergence) while UMAP can preserve global structure (cross entropy)

  • UMAP find approximate nearest neighbours very efficiently (even in high-d space)
    • Random-projection tree + nearest neighbour descent
    • SGD + negative sampling (avoid the curse of dimensionality and can do dimension reduction over 3-d, unlike t-SNE)
  • Use Python + Numba (custom distance metrics as long as it can be compiled by JIT)
  • Can make use of labels for supervised dimension reduction
  • combine different distance (categorical data, discrete data, etc.)

3. t-SNE

results matching ""

    No results matching ""