Dynamic Sparsity in Machine Learning
Routing Information through Neural Pathways
NeurIPS 2024 Tutorial

Program

  • Introduction (15 min)
  • Part 1: Sparse and Structured Transformations (45 min)
    • Sparse transformations: ReLU, sparsemax, Ω-argmax
    • Sparse attention and adaptively sparse transformers
    • Structured and sparse differentiable layers
    • Sparse associative memories and ∞-former
    • Mixed discrete/continuous latent models: Hard Concrete, Gaussian-Sparsemax
    • Learning sparse policies in RL and future directions
  • Part 2: Sparse Architecures and Representations in Foundation Models (45 min)
    • Sparse Mixtures of Experts: from LSTM MoEs to Mixtral
    • Conditional computation and early exit: Mixture of Depths
    • Sparse memories: Cache Eviction Policies, Dynamic Memory Compression
    • Modular deep learning and sparse mixtures of adapters
  • Q&A (15 min)
  • Panel with Sara Hooker and Alessandro Sordoni (30 min)

Panellists

    Sara Hooker

  • Sara Hooker

    Sara Hooker is VP of Research at Cohere and leads Cohere For AI, a research lab that seeks to solve complex machine learning problems and supports fundamental research that explores the unknown. She leads a team of researchers and engineers working on making large language models more efficient, safe and grounded.

  • Alessandro Sordoni

  • Alessandro Sordoni

    Alessandro Sordoni is a principal researcher at Microsoft Research Montréal, an adjunct professor at Université de Montréal, and a core industry member at Mila. Recently, his research has focused on the efficiency of learning and systematic generalization in current large deep learning models.