Release: Power Retention

Authors

Jacob Buckman

Carles Gelada

Sean Zhang

Published

September 23, 2025

Today, we are open-sourcing power retention, which is far more efficient than transformers at long contexts, without sacrificing scalability or hardware efficiency. Use pip install retention to try it out, or get the code here.

Power retention is a drop-in replacement for the attention layer of any transformer: flash_attention(q,k,v) becomes power_retention(q,k,v). After this substitution, you can expect to see speedups of >10x during training and >100x during inference for context lengths of 64k tokens, with gains increasing even further at still-longer contexts. This is possible thanks to our hardware-aware implementation, which achieves GPU utilization comparable to that of FlashAttention.

For an overview of power retention, read this article. For a deeper dive, read our research paper on ArXiv.

Pre-trained transformers can easily be metamophosed into power retention models by doing a small amount of retraining. As a demonstration, we have retrained StarCoder2-3B into PowerCoder-3B, which is available open-source & open-weights on Huggingface.

Finally, we are also open-sourcing Vidrial: our framework for writing clean, efficient CUDA kernels. Learn more about Vidrial here. Vidrial kernels are available via the retention package as retention.experimental. For example, our Vidrial implementation of FlashAttention2 is up to 20% faster than existing implementations, and can be used with from retention.experimental import flash.

Artificial human-level intelligence requires the ability to synthesize a lifetime of experiences, and that starts with the fundamental design of the architecture. Power retention unlocks a glorious long-context future. If you have an exciting long-context task or dataset, please reach out to contact@manifestai.com, we would love to collaborate. And, if you are interested in being ahead of the curve on the next generation of foundational AI models, join our community discord and subscribe to our mailing list below.

Acknowledgments

We would like to thank SF Compute for supporting this research.