writing

Date Title
Jan 1, 2024 Our Mission
Jan 5, 2024 Linear Transformers Are Faster
May 16, 2024 Compute-Optimal Context Size
Aug 14, 2024 LongCrawl64: A Long-Context Dataset
Aug 15, 2024 Symmetric Power Transformers
Sep 23, 2024 Why Gradient Descent Minimizes Training Loss
Jul 6, 2025 Release: Power Attention

code

Power Attention

research

Scaling Context Requires Rethinking Attention

about us

Carles Gelada, Jacob Buckman, Sean Zhang. We are 8+ year veteran deep learning researchers, formerly at labs like OpenAI, Google Brain, and Meta. Our research has been published at NeurIPS, ICLR, ICML, etc., and been cited 2000+ times. We are backed by a small group of highly techinal long-term investors including Decibel and True.

join us

We are hiring core technical team members.

The role has elements of both software engineering and research. Responsibilities include implementing deep learning architectures, deriving algorithms, developing research infrastructure, running large-scale experiments, and interpreting and communicating results.

We will work well together if you are independent-minded, capable of self-teaching, and value thinking from first principles. Skills we are looking for include comfort with mathematics, strong communication, deep knowledge in areas like CUDA, XLA/MLIR, Jax, or distributed systems/HPC, and experience training large-scale deep learning models.

We do not care about formal credentials. If you share our vision and would like to get involved, please send an example of some technical work that you are proud of to