The Road to AlphaFold: From Folding Dreams to Predictive Power (1990s–2000s)
Series: The Molecular Code – Part 6 (CASP, Rosetta, and the Optimization Revolution)
By the early 1990s, the field of molecular biology had reached a quiet crisis: simulations alone couldn’t solve the protein folding problem. Folding, as physical motion over time, was simply too complex and too computationally expensive. But what if the goal wasn’t to watch folding—but to predict its outcome? This was the great liberation that I covered in the last post, the formulation of protein structure prediction (PSP) as separate from protein folding.
The 1990-2000 took full advantage of this liberation. PSP matured into its own field—decoupled from folding simulations and driven by competition, clever algorithms, and open data.
This post explores four key developments from the 1990s that brought structure prediction into the computational spotlight.
1. CASP: Let the Community Compete
In 1994, the field changed forever with the launch of CASP—the Critical Assessment of protein Structure Prediction.
CASP was more than a workshop. It was a global experiment. It went along something like this:
1. Let’s share sequences of proteins with unknown (but soon-to-be-solved) structures with the community. Let’s call them targets.
2. Now let’s have scientists work for a few months to predict potential tertiary native structures of these protein sequences/targets. Scientists can use computational algorithms and/or intuitions. Both machine and human categories were accepted in this competition.
3. After the submissions of these blind predictions, organizers then compared the predictions to the true structures (behind the scenes, experimentalists had worked on actually resolving structures in their wet laboratories). Various metrics were gradually introduced, with the top one being GDT-TS.
Here is a summary of what CASP did for the community and PSP:
Why this was so important:
CASP redefined the research culture. It gave clarity, competition, and community to PSP. It was the first dashboard. It drove innovation through healthy competition. And as it matured, it also created various tracks—template-based, threading, and ab initio (from sequence only) —that mirrored the growing field’s complexity and algorithmic capabilities. Crucially, it divorced PSP from folding. There was now the CASP community. Whether your approach was physics, statistics, or machine learning, all were welcome—as long as they worked.
This was PSP’s coming of age.
2. Rosetta: The Power of Fragments and Scoring
What else happened in the 90s? Oh, a new tool/approach from David Baker’s lab: Rosetta. Yeap, that David Baker, the 2025 Nobel Laureate. As an aside, David Baker was the one that pushed PSP forward in a huge way, but that is NOT why he got the Nobel. He pivoted to protein design pretty soon after his work in PSP, and that is what the Nobel award recognize. Ok, back to Rosetta for PSP.
Rosetta was a protein modeling package. It was revolutionizing in that it was open source. It was built on the idea of the molecular fragment replacement. The key insight was that you could put together a 3D structure like solving a puzzle with lego pieces. It recognized that the PSP problem could be simplified into recognizing the lego pieces, the fragments that protein structures use over and over. The idea of hierarchical assembly, from smaller pieces to the big, packing of the pieces in 3D, did not originate with Baker. Many folks worked on the idea of foldons, including my dear collaborator, Ruth Nussinov. But these folks were thinking folding. Baker saw PSP, just putting together a 3D model with smaller pieces.
Baker/Rosetta discretized protein structure space using short fragments from known proteins. Yes, data-driven. Structures of proteins in the PDB (recall, the PDB, the big breakthrough of the previous decade? See related post here.) were disassembled into smaller pieces, fragments. Ok, how were these pieces put together? Through sampling-based/stochastic optimization. Folks started simple, through Monte Carlo algorithms. A move in the algorithm consisted of picking a fragment from the PDB-excised database and “trying it out” in a growing structure. Was it good, bad, how was this move evaluated? Through an energy function? What function? The ones that Levitt and others had started thinking about. David Baker devised his own suite of scoring functions. He blended physics with statistical scoring functions. Another example of data-driven approaches coming together to drive optimization for PSP.
Here is a slide I have used a LOT in talks to summarize Rosetta visually:
And here is a breakdown of what I considered to be breakthroughs in Rosetta, well before the bots decided this word characterized EVERYTHING humans every suggested.
Rosetta was pretty amazing. I loved it as a graduate student. It was more than a package. It was almost a philosophy:
Less physics, more heuristics.
Less simulation, more search.
Less folding, more building.
Why all this mattered:
Rosetta made ab initio (we started calling this de novo later) PSP tractable. It wasn’t perfect, no, but it was modular, tunable, and open-source. That accessibility seeded innovation in the years to come and made it possible for researchers like me—and hundreds of others—to enter the field and expand it. When I came in, about a decade later, I appreciated the core role of search. And I took it away. I built over the Monte Carlo approach to design stochastic optimization algorithms that remembered and tried to explore new structures in a fixed budget. Yeah, those were the days, when you were acutely aware of a limited computational budget.
3. The Optimization Revolution: Complexity Meets Creativity
With folding decoupled from PSP, the field started speaking a new language: optimization. Instead of asking how proteins fold in nature, scientists asked:
How do we find the lowest-energy conformation in a vast, high-dimensional space? That is the question I asked over and over and answered slightly better and better in my research papers first as a graduate student and then as an Assistant Professor.
Please realize how profound this shift was. Energy functions (like Scheraga’s ECEPP) now drove search algorithms, not simulations. Monte Carlo methods, simulated annealing, and other optimization techniques became the engines of discovery. The moment you formulate PSP as an optimization problem (though it may limit you in what you can accomplish), you open an avenue of innovation for computer scientists to advance the field with novel, foundational contributions to optimization. It is the best of both worlds, something to which we now refer to as the virtuous cycle. The cycle pretty much goes like this: Here is a domain problem that seems so hard. Ok, let’s abstract some stuff away and turn it into a computational problem. Oh, we realize we do not have any algorithms powerful enough. So, go back to the “board” and devise new algorithms. Cool, new algorithms means making foundational advances in computer science. Oh, and now look at it, these algorithms are also allowing us to advance this domain problem. Great, can we do better? Let’s enter the circle again!
Ok, I think I had enough fun with that. But one more thing. Between 1992 and 1998, several researchers formally proved that PSP is NP-hard—even in simplified lattice models. This underscored what many suspected: prediction was not just biologically difficult—it was computationally intractable in the worst-case sense. Here is a screenshot of an important paper that provided theoretical results.
And here is a more detailed list of all the important theoretical work for those interested:
💡 This theoretical framing lowered barriers for computer scientists to join the field. Optimization became not just a tool—but a worldview. It gave rise to research that was rigorous, reproducible, and scalable.
🌐 The 1990s in Retrospect
The 1990s were a decade of great liberation. Folding and prediction parted ways. Simulations gave way to modeling. And the community gained a shared stage (CASP), a powerful toolkit (Rosetta), and a new scientific language (optimization). This was the decade computer scientists and algorithms joined en masse.
4. NSF-funded Research gets Interesting: An Explosion of Ideas
The 1990s also witnessed something quieter but equally transformative: an explosion in NSF-funded computational research.
For the first time, structure prediction became a compelling challenge across disciplines. NSF recognized this and began funding:
Novel sampling algorithms
New energy functions tailored for fragment-based or hybrid models
Interdisciplinary projects connecting computer science, chemistry, and biology
Open-source infrastructure that seeded the next generation of tools
Look at some of the funded projects to appreciate the diversity of ideas.
Yeap, the famous DIMACS was founded and funded in this decade.
The first NSF CAREER was awarded for PSP research to Richard Lathrop!
Why this mattered:
This was when the field opened up. No longer limited to a few structural biology labs with MD expertise, structure prediction became a platform for algorithmic creativity—and NSF again came in and made that possible.
To wrap up:
Rosetta was not yet AlphaFold. But all the ideas were starting to line up. So, what was going to happen next? Sampling challenges, open platforms, and explosion in NSF-funded diversity; this is when I entered the field. I will explore this more in the next post.
→ Next up: The 2000s—Data, Sampling, Platforms, and the Deep Learning Prelude
🧬 Catch up on previous posts in The Road to AlphaFold series:
Catch up on previous posts in The Road to AlphaFold series:
Part 1: The Road to AlphaFold: A 70-Year Odyssey in Protein Science
Part 2: The Road to AlphaFold: When Biology Founds its Code (1950-1960)
Part 3: The Road to AlphaFold, Under the Surface (1960s–1970s)
Part 4: The Road to AlphaFold, Molecular Dynamics Ascending (1970–1980)
Part 5: The Road to AlphaFold, Bridging the Divide (1980s–1990s)
Enjoying this series? Subscribe, share, or forward to a friend.
Read more at: