Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Foundation models, primarily based on Transformers, struggle with computational inefficiency on long sequences. To address this, Mamba is introduced, a new model architecture that replaces attention with selective state space models (SSMs). Mamba enables content-based reasoning, achieves linear scaling with sequence length, and surpasses Transformers in performance across various modalities, including language, audio, and genomics.