Introduction (15 min)
Part 1: Sparse and Structured Transformations (45 min)
- Sparse transformations: ReLU, sparsemax, Ω-argmax
- Sparse attention and adaptively sparse transformers
- Structured and sparse differentiable layers
- Sparse associative memories and ∞-former
- Mixed discrete/continuous latent models: Hard Concrete, Gaussian-Sparsemax
- Learning sparse policies in RL and future directions
Part 2: Sparse Architecures and Representations in Foundation Models (45 min)
- Sparse Mixtures of Experts: from LSTM MoEs to Mixtral
- Conditional computation and early exit: Mixture of Depths
- Sparse memories: Cache Eviction Policies, Dynamic Memory Compression
- Modular deep learning and sparse mixtures of adapters
Q&A (15 min)
Panel with Sara Hooker and Alessandro Sordoni (30 min)

Panellists

Sara Hooker

Sara Hooker is VP of Research at Cohere and leads Cohere For AI, a research lab that seeks to solve complex machine learning problems and supports fundamental research that explores the unknown. She leads a team of researchers and engineers working on making large language models more efficient, safe and grounded.

Alessandro Sordoni is a principal researcher at Microsoft Research Montréal, an adjunct professor at Université de Montréal, and a core industry member at Mila. Recently, his research has focused on the efficiency of learning and systematic generalization in current large deep learning models.