Frontiers in Language Model Systems & Algorithms

The GPU MODE × Scale ML speaker series is a 5-day, online event where top researchers in AI will talk about architectural and system-level advances that underpin OpenAI’s frontier open-source model, GPT-OSS. In addition to covering these core components, the series will explore the frontiers of these methods and the directions in which they are evolving beyond GPT-OSS.

We will livestream and record these talks online on the GPU MODE YouTube Channel, where viewers can ask questions to the speakers live.

Each day will consist of ~1.5 hours of talks and discussions (around noon PST, may start at slightly different times each day so please check frequently), covering a different component of the evolving transformer stack—from quirks in the attention mechanism and positional encodings to quantization, MoEs, and custom GPU kernels. All talks will be recorded to the YouTube channel as well.

Speakers

Sewon Min
UC Berkeley
Monday, Aug 25
1:15 EST / 10:15 PT
Topic: Overview of Talks, Mixture of Experts
Guangxuan Xiao
MIT
Tuesday, Aug 26
2:00 EST / 11:00 PT
Topic: Attention Sinks
Chris De Sa
Cornell
Wednesday, Aug 27
3:00 EST / 12:00 PT
Topic: Quantization
Songlin Yang
MIT
Thursday, Aug 28
2:00 EST / 11:00 PT
Topic: Positional Encodings, PaTH Attention
Simran Arora
Stanford
Friday, Aug 29
2:00 EST / 11:00 PT
Topic: ThunderKittens
William Brandon
Anthropic
Friday, Aug 29
1:00 ET / 10:00 PT
Topic: GPU Programming Fundamentals

Organizers