Abstract:Most of computational biology is predicated upon the sequence → structure → function → phenotype paradigm. Thanks to artificial intelligence and the availability of data at various scales, researchers have been trying to bridge gaps between the different tiers of this process, starting from the age-old genotype–phenotype modeling to CASP and Alphafold’s sequence-structure up to recent attempts to go from sequence to ensemble. However, physical causality is often missing in the traditional bioinformatic models, thus far sidelining the AI-driven advances only to predictions of the forward direction.
The lecture will introduce physical ideas to conceive generative models that backmap phenotypes down to an ensemble of structures and sequences. For example, leveraging our work on modeling the diffusion of charge carriers in bioenergetic membranes (Cell 2019, JACS 2024, Nature Metabolism 2024, Nature Comm 2025), we computed the mechanism of chemokine binding to the Oxford CovidVaccine (Science Adv 2021, iScience 2025). With AstraZenaca, we computationally redesigned the adenovirus vector to prevent potential clotting disorders. Using Google's inception network algorithm, we invert this immune recognition function into a generalizable learning strategy of electrostatic structures across proteins. We are now using this electrostatic network to study disease association in patients (Cell Systems 2024), as well as design peptide therapeutics, and search for hidden toxins, covering the entire human proteome, generalizing the molecular function-to-ensemble paradigm.