Scale ML

Image Description

▸ We are a cross-lab MIT AI graduate student collective focusing on Algorithms That Learn and Scale.
▸ The group is open to all with an academic email - however if you are still interested shoot us an email or message us via Twitter. We currently host bi-weekly seminars and will have hands on sessions and research socials in the future.
▸ Our snacks 🍰 are currently funded by generous donations from Pulkit Agrawal and Yoon Kim.
▸ Please contact the organizers for inquires

▸ Join our next seminar on Zoom or in-person:
Click here to join the mailing list

Discussion Schedule

  • 10/30 u-μP: The Unit-Scaled Maximal Update Parametrization Charlie Blake (Graphcore)
  • 10/16 Transformers and Turing Machines Eran Malach (Harvard)
  • 09/04 A New Perspective on Shampoo's Preconditioner Nikhil Vyas (Harvard)
  • 08/22 1B parameter model training. (hands on session) Aniruddha Nrusimha (MIT)
  • 08/12 How to scale models with Modula in NumPy. (hands on session) Jeremy Bernstein (MIT)
  • 07/24 FineWeb: Creating a large dataset for pretraining LLMse Guilherme Penedo (Hugging Face)
  • 07/17 Hardware-aware Algorithms for Language Modeling Tri Dao (Princeton)
  • 07/10 LLM360: Towards Fully Transparent Open-Source LLMs Hongyi Wang (CMU)
  • 07/3 DeciMamba: Exploring the Length Extrapolation Potential of Mamba. Assaf Ben-Kish (Tel-Aviv)
  • 04/17 Adapting LLMs with Reinforcement Learning Idan Shenfeld
  • 04/03 The Quest to build an (O)pen (L)anguage (Mo)del Luca Soldaini (AI2)
  • 03/20 Efficient Deep Learning with Sparsity: Algorithms, Systems, and Applications Zhijian Liu
  • 03/12 Building and Deploying Large Language Model Applications Efficiently and Verifiably Ying Sheng (Stanford)
  • 03/06 In-Context Language Learning and N-gram Heads Ekin Akyürek
  • 02/21 Neurons, norms and number systems Jeremy Bernstein
  • 11/28 Sparsity in Transformers Shobhita Sundaram
  • 10/18 Large-Scale RNNs in the era of Transformers Bailin Wang
  • 11/01 Critical batch-size in deep learning Minyoung Huh (Jacob)
  • 10/18 Tensor Program Synthesis Han Guo
  • 10/04 Mixture of Experts (MOEs) Jyo Pari
  • 09/13 Speculative Decoding Aniruddha Nrusimha