▸ We are a cross-lab MIT AI graduate student collective focusing on Algorithms That Learn and Scale.
▸ The group is open to all with an academic email - however if you are still interested shoot us an email or message us via Twitter. We currently host bi-weekly seminars and will have hands on sessions and research socials in the future.
▸ We are funded by generous donations from Pulkit Agrawal, Yoon Kim and BVP .
▸ Please contact the organizers for inquires

▸ Join our next seminar on Zoom or in-person:
Click here to join the mailing list

Discussion Schedule

5/7 TBD Yilun Du (Harvard)
TBD TBD Aviral Kumar (CMU)
4/9 Biomolecular Modeling with Boltz-1 Jeremy Wohlwend (MIT)
3/26 Self-improvement of LLM agents through Reinforcement Learning at Scale Yifei Zhou (BAIR)
3/12 Design and Optimization of Large-Scale Inference Systems at Kimi AI Heyi Tang (Kimi)
3/5 What’s Next for Mamba? Towards More Expressive Recurrent Update Rules Songlin Yang (MIT)
2/26 Towards End-to-end Cost-effective Pre-training for Large Language Model Yikang Shen (IBM)
2/5 Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion Boyuan Chen (MIT)
1/22 Hymba: A Hybrid-head Architecture for Small Language Models Xin Dong (NVIDIA)
12/4 Machete: A Mixed-Input GEMM Kernel Optimized for NVIDIA Hopper GPUs Lucas Wilkinson (Neural Magic)
11/20 StreamingLLM and DuoAttention: Efficient and Effective Long Sequence Modeling for Large Language Models Guangxuan Xiao (MIT)
11/13 Exocompilation for Productive Programming of Hardware Accelerators Yuka Ikarashi (MIT)
10/30 u-μP: The Unit-Scaled Maximal Update Parametrization Charlie Blake, Constantin Eichenberg (Graphcore)
10/28 ZipNN - A Lossless Compression Library tailored for AI models Moshik Hershcovitch (IBM Research)
10/16 Transformers and Turing Machines Eran Malach (Harvard)
09/04 A New Perspective on Shampoo's Preconditioner Nikhil Vyas (Harvard)
08/22 1B parameter model training. (hands on session) Aniruddha Nrusimha (MIT)
08/12 How to scale models with Modula in NumPy. (hands on session) Jeremy Bernstein (MIT)
07/24 FineWeb: Creating a large dataset for pretraining LLMse Guilherme Penedo (Hugging Face)
07/17 Hardware-aware Algorithms for Language Modeling Tri Dao (Princeton)
07/10 LLM360: Towards Fully Transparent Open-Source LLMs Hongyi Wang (CMU)
07/3 DeciMamba: Exploring the Length Extrapolation Potential of Mamba. Assaf Ben-Kish (Tel-Aviv)
04/17 Adapting LLMs with Reinforcement Learning Idan Shenfeld
04/03 The Quest to build an (O)pen (L)anguage (Mo)del Luca Soldaini (AI2)
03/20 Efficient Deep Learning with Sparsity: Algorithms, Systems, and Applications Zhijian Liu
03/12 Building and Deploying Large Language Model Applications Efficiently and Verifiably Ying Sheng (Stanford)
03/06 In-Context Language Learning and N-gram Heads Ekin Akyürek
02/21 Neurons, norms and number systems Jeremy Bernstein
11/28 Sparsity in Transformers Shobhita Sundaram
10/18 Large-Scale RNNs in the era of Transformers Bailin Wang
11/01 Critical batch-size in deep learning Minyoung Huh (Jacob)
10/18 Tensor Program Synthesis Han Guo
10/04 Mixture of Experts (MOEs) Jyo Pari
09/13 Speculative Decoding Aniruddha Nrusimha