Dynamic Sparsity in Machine Learning
Routing Information through Neural Pathways
NeurIPS 2024 Tutorial

Program

  • Introduction (15 min)
  • Part 1: Sparse and Structured Transformations (45 min)
    • Sparse transformations: ReLU, sparsemax, Ω-argmax
    • Sparse attention and adaptively sparse transformers
    • Structured and sparse differentiable layers
    • Sparse associative memories and ∞-former
    • Mixed discrete/continuous latent models: Hard Concrete, Gaussian-Sparsemax
    • Learning sparse policies in RL and future directions
  • Part 2: Sparse Architecures and Representations in Foundation Models (45 min)
    • Sparse Mixtures of Experts: from LSTM MoEs to Mixtral
    • Conditional computation and early exit: Mixture of Depths
    • Sparse memories: Cache Eviction Policies, Dynamic Memory Compression
    • Modular deep learning and sparse mixtures of adapters
  • Q&A (15 min)
  • Panel with Sara Hooker and Emtiyaz Khan (30 min)