Example predictions of SimpleFold on protein targets with ground truth in light aqua and predictions in deep teal, plus performance scaling from 100M to 3B parameters and inference times on consumer hardware.The SimpleFold Method: Flow-Matching And Protein StructureFlow-matching provides the foundation for SimpleFold’s approach. This generative technique creates a time-dependent process that transforms noise into data by integrating an ordinary differential equation over time. The method defines probability distributions that continuously transform tractable Gaussian noise into complex protein structures.SimpleFold casts protein folding as a flow-matching generative model that produces protein structures from noise, conditioned on amino acid sequences. Given a protein with Na heavy atoms, the model builds a linear interpolant between noise and all-atom positions, where both exist in ℝ^(Na×3), conditioned on the amino acid sequence.Unlike earlier work that modeled only backbone atoms, SimpleFold generates full-atom conformations including both backbones and side chains. This comprehensive approach mirrors advances in sequence-augmented flow matching for proteins.The training combines two objectives: a standard flow-matching loss that measures velocity field prediction accuracy, and an additional Local Distance Difference Test (LDDT) loss that ensures structural quality. The LDDT loss measures atomic pairwise distance errors between generated and ground truth structures, helping the model learn refined atomic positions.A key innovation is the timestep resampling strategy. Instead of uniform sampling, SimpleFold uses a logistic-normal distribution that samples more densely near clean data (t=1). This focuses training on capturing fine structural details, particularly important for side chain positioning.A General-Purpose Transformer Architecture for ProteinsSimpleFold’s architecture represents a complete departure from domain-specific designs. The model uses only standard transformer blocks with adaptive layers, eliminating the expensive pair representations and triangular updates that define AlphaFold2.
Overview of SimpleFold’s architecture built on general-purpose transformer blocks with adaptive layers, eliminating the need for pair representations or triangular updates.The architecture contains three main components: lightweight atom encoder and decoder modules (symmetric in design) and a heavy residue trunk. All modules use standard transformer blocks conditioned on timestep through adaptive layers. Read more Source: https://aimodels.substack.com/p/do-prot ... truly-need