Cesar de la Fuente
Subject: Using Peptides for Medical Breakthroughs
Bio: Director of Penn’s Machine Biology Group
Transcript:
Larry Bernstein:
Welcome to What Happens Next. My name is Larry Bernstein. What Happens Next is a podcast that covers economics, politics, and history. Today’s episode is Using Peptides for Medical Breakthroughs.
This podcast was taped at a conference where I hosted several Penn Professors on various topics. The audience included my friends who will join me in asking questions.
Our speaker is Cesar de la Fuente is the Director of Penn’s Machine Biology Group. His team uses AI with biology to create new antibiotics that hopefully can save millions of lives. Cesar, can you please begin with six minutes of opening remarks.
Cesar de la Fuente:
We have biology driven by evolution that has given rise to our brain. For a long time, I thought that if we could learn how biology works from first principles and then extract that to build technologies and biotechnologies, then we would solve problems. Not only in medicine but sustainability, and other things that affect the future of humanity.
My lab has been applying it to the problem of bacterial infections that are becoming increasingly resistant. Bacterial infections are associated with five million deaths per year in the world. If we don’t come up with new therapies to treat these infections, by the year 2050, that number is projected to double to 10 million deaths per year, becoming the number one cause of death.
I see it as this huge existential threat to humanity. A lot of the work that we’ve been doing over the past decade plus has been incorporating concepts from computational biology to change how we discover new antimicrobial molecules that we can use to confront this crisis.
We’ve had antibiotics for less than 100 years. And yet, if we combine antibiotics, vaccines, and clean water, those three pillars have essentially doubled lifespan. If you go to hospitals all around the globe, we have patients with multi-drug-resistant infections that are completely untreatable, even if we combine the most potent cocktails of antibiotics.
Going back to our research, how we’ve decided to treat it like an information theory problem. If you think of biology, you have DNA and proteins. In its simplest terms, it’s a bunch of code. DNA is a four-letter code; proteins are a 20-letter code of amino acids. It’s not that different from the code that we use to communicate with each other through the alphabet. All this complexity, you reduce it to a bunch of code, then you can devise algorithms that can explore this code and identify potential drugs.
This is very different from the traditional paradigm of antibiotic discovery, which is this physical process where scientists go around nature in these expeditions and dig into soil to find antimicrobial drugs. That’s painstaking work that relies on trial and error. Oftentimes it can take more than the time that it takes to finish a PhD to find new molecules that are preclinically irrelevant. So, it’s not conducive in an academic setting, and it costs over $2 billion to discover and develop antimicrobial drugs.
We thought that with AI, we could completely change this landscape. We started by developing machine learning models that can explore entire genomes. We found thousands of new molecules encoded in our genetic code that had never been described in science by using this algorithm.
And the vast majority of those play a role in the immune system that was previously unanticipated. We then expanded that and looked at our genetic code, the genetic code of bacteria, and different microbes. We’ve looked at Archaea as well, which are esoteric organisms. And we’ve looked at ancestral biology as a source of dysfunctional antimicrobial molecules.
If you think about history, most of the life that has ever existed on our planet is now extinct. So how can we understand anything about biology or about evolution if we don’t have access to that information that existed previously? And so, we’ve developed AI systems that can mine biology at a large scale, and we’ve done a project where we’ve explored all ancient biological data with AI systems.
We’ve done this journey through evolutionary history and identified new preclinical candidate leads in the genetic data of ancient penguins that were extinct as recently as the 50s, Magnolia trees that disappear throughout evolution, ancient Zebras, Woolly Mammoths, ancient Elephants, giant Sloths, and many other creatures that used to roam around our planet.
Larry Bernstein:
What I think is interesting is the relationship between basic science and its later application. When my grade school and high school friend Joe Thornton was first extracting mammalian DNA as a Professor at the University of Chicago, AI hadn’t yet been developed. I can’t imagine he had considered using the ancient proteins that you are using it. How do you think about the codependency of basic science and its application as it relates to the work that you’re doing?
Cesar de la Fuente:
AI is portrayed as this magical thing that can do everything. And that’s not true. Our AI models have been trained using experimental data that we’ve generated in my lab through actual biochemistry experiments. One thing that is critical to highlight is that for an AI model to be reliable and accurate, you need to feed it to apply good and accurate data.
Larry Bernstein:
So, you have this mammoth DNA chain. How did your models know that that piece was valuable because the chain is almost infinitely long. How did it know that’s the one?
Cesar de la Fuente:
That model is called Apex. So, it’s a deep learning model that was trained on a lot of experimental data collected in my lab. So, it learns what makes a particular chain be successful at killing bacteria, even specific clinically relevant pathogens, and then it can essentially run through the whole code. And once you find something that is promising, it identifies that and then it comes up with a ranking. So, what the model gives us is a ranking one to sometimes a million molecules based on statistical prediction.
The model is saying, number one is the most likely to be a successful new drug to kill these infections, and then it goes down. And then what we do in my lab, we take the recommendations made by the algorithms and human scientists look at those recommendations and then we go over them and sometimes the algorithms miss things like maybe this molecule is not good when you’re trying to develop a drug. So, we ruled that one out.
The ramification of this ancient biological work is that some of these molecules that we’ve discovered when we sequence homology with present day proteins or molecules that exist in the world today, we don’t see any homology, meaning some of these are extinct. They’re not present anywhere in the world around us today. And so, from a bioethical perspective, is it okay for us to synthesize some of these molecules.
Larry Bernstein:
I’m fine with it.
Cesar de la Fuente:
But I’d like to think about the philosophical ramifications of the work that we do. And the other interesting aspect of this work is the patentability of this. Natural molecules are not patentable. That was determined so by this legal case that was called the Myriad case. So, everything that exists in nature belongs to humanity and therefore you cannot patent it. But what about molecules that no longer exist in nature, that used to exist hundreds of thousands of years ago, and we no longer find them in the world today, are those patentable?
The truth is that nobody knows. And this has created a new sub-area of patent law where they’re trying to figure out what to do with this research.
Larry Bernstein:
25 years ago, my biggest investment was in Monsanto, and the reason I invested in it was on the financial side, it was trading for less than the last four years of R&D. And my hope was that they had not wasted their money and that this would prove to be valuable.
When I met with the CTO at the time, I asked him what they are trying to achieve? And he said, “We’ve ranked the problems. Number one is weeds, and so we’re going to have a genetic aspect that makes the plant not die from certain pesticides. So, the pesticide will kill the weed, but not the plant.” And then we were going to add additional genetics over time to deal with certain problems.
And we can compare that with the old ways in which we used to let plants evolve. So, they say, “This is a hardy plant, and this is a big fast grower.” We’ll crossbreed the two in the hope that that would combine both traits, but this new way, we’ll just cut and paste it and put it on the genome.
I think if someone had asked me before reading your papers, “What are our prospects for antibiotic creation?” I would have said, “Oh, what we need to do is send teams into the Amazon to be clipping around, looking for stuff, and hoping that we find something out there in the wild.” That’s the old school technique, or you can go the Monsanto way, which is let’s look at the genetic code and cut and paste directly the code. Is that a good metaphor for genetics in how agriculture has recently developed and how we’re going to apply that to antibiotics?
Cesar de la Fuente:
The acceleration of the process is certainly true. With traditional methods, it can take seven years to find preclinical candidates. Now with these AI systems that we’ve developed in a few hours, we can discover hundreds of thousands. And so, we’ve entered this digital age of scientific discovery that is hugely exciting. For all the negative ramifications of AI in our society, this is a good one. Its application to accelerate the process to the preclinical stage. Things still need to go through clinical trials and that takes time. But I know there are people working on how to design better clinical trials that are faster, more efficient, and so on.
Moira McDermott:
We were having a discussion at dinner last night about AI use in math research, and recently they’ve been able to solve some of Hilbert’s problems. AI was able to solve these because it could go through thousands of problems and do lots of calculations and solve the low hanging fruit. To get at the harder problems, you’re still going to be able to have AI assist, but it’s not going to knock them off in the same way. It can make all these calculations and find some technique that showed up in a paper in the 50s that you’ve forgotten about, but it’s still going to take the human. It helps humans be more creative because it’s this very efficient assistant.
I’m curious if you’re using AI, is it predominantly that it’s able to do so much quickly, or is it helping you come up with new things? How much is it interacting that way?
Cesar de la Fuente:
Primarily it’s helping us process large amounts of information that would be impossible for the human brain to process and identify patterns within those data that would be impossible for the human brain to do.
Broadly, I would agree that AI systems can learn how to play chess at superhuman levels, but they cannot invent chess. They have a hard time going out of the distribution, meaning they have a hard time creating things that are not within the training set that we’ve taught them.
The only example of artificial general intelligence that we have in the universe is the human brain, and we don’t understand the human brain. So, then I think the whole argument collapses there.
Are we going to get to general intelligence just through ChatGPT like chatbots and things like that? No, I don’t think so.
Randy Kamien:
Is this work something that can be patented? Who’s going to own it?
Cesar de la Fuente:
The university. What patent lawyers tell me is that ... So, where they’re learning is a non-obvious discovery because we had to train and develop an AI model to find molecules in ancient biological data. And then we had to synthesize them through chemistry and then we had to do tests to validate that they indeed had antimicrobial properties. So, if you combine all these different steps, it’s considered a non-obvious discovery. And so, the current thinking is that we might be able to patent some things. My dream is that something that we do in the lab can help humanity.
Larry Bernstein:
When I worked at Salomon Brothers, I created a derivatives products entity, and it had formulas that were not obvious. And so, I applied for a patent on behalf of Salomon Brothers for it. The question was, you can’t patent a formula, but you can patent a process. And so, we put it in computer models and to say we were going to calculate these formulas, and it turned out the patent office rejected that. Going back to the genetically modified food, Monsanto takes some genetic code and staples it to some preexisting genetic code and says, look at this combination is unique and novel and that can be patented. I think that’s more akin to what you’re talking about.
Cesar de la Fuente:
We can take something from nature; you can modify one single letter on the molecule and then that’s synthetic and then it’s patentable. So, there are a lot of tricks that you can do.
Randy Kamien:
I’ve written patents and there’s always the list of all the people who contributed to it. I’m trying to understand who contributes to something that an AI discovered.
What about the people who got the data about Wolly Mammoth’s DNA?
Cesar de la Fuente:
No, because that’s in the public domain. It oftentimes happens in science. We stand on the shoulders of giants. There are a lot of people that develop sequencing methods for reading and amplifying ancient DNA, which is much degraded. The pinnacle of that field was a couple of years ago, Svante Pääbo was awarded the Nobel Prize for sequencing, which is hard because typically you take DNA from a Neanderthal bone and then you’re trying to read it and amplify it in a clean room. Otherwise, you’re contaminating it with your own skin, bacteria, microbes that are in the environment. And a lot of that genetic data is in different repositories and databases that we can access.
Shani Raviv:
You mentioned that there was a spin-off of a company already based on the findings. So, is this the goal now to spin off medicine? Because it sounds like with the pace that this is happening, you can generate a lot of successful medicine companies. So, is this the goal?
Cesar de la Fuente:
We spun out of a company. We’re in the process of raising a seed round now. We work with peptides, which are small proteins, and the goal will be to take peptide design into the new era. And to be able to program and design peptides for different applications, not only in infectious diseases, but also in immunology, oncology, and neuroscience. So, the whole goal of the company will be to serve as a translational vehicle for the findings in my lab and to take them to the world.
In the lab, we do more creative research. We publish our papers, open access, so everybody has access to what we do. The company will make medicines to cure people.
Ron Bernstein (my brother):
In the AI world today, in these large language models, everything is shared, but in biotech and pharma, it’s not because Sanofi doesn’t want to share all their peptides and all their super-secret work with Takeda, right? How is that going to affect the use of AI to solve all these medical problems? And is it going to be a lot slower because people don’t want to share?
Cesar de la Fuente:
I think it would be a lot better for the advancement of science if everybody just shared everything. Of course, in companies, it’s worth a lot of money, a lot of that data that some companies are not willing to share. There are some ways of sharing data without giving away all the details. So, through federated learning, for example. So, some companies have set up consortia where they can share data without knowing exactly what the molecule is, and you can train models that way. I know other companies have AI models that they’ve trained based on their in-house and they are willing to share the models already trained, so that way they don’t have to share those drugs and the IP associated with those drugs.
Ron Bernstein:
AI is going to remove the need for wet lab space, and you won’t need to go into the lab and do chemistry, or do you think you always need the labs?
Larry Bernstein:
My brother is on the board of directors of a company that owns pharmaceutical labs. So, he’s desperate to make sure that you continue to use this space.
Cesar de la Fuente:
In academic research, you see people spending a lot more time analyzing data than 20 years ago. The other thing I’ll say is that in my lab we do a lot of chemical synthesis and a lot of experiments because we generate data sets to train our AI models. And so, we need humans to do the work. Now, if you can automate all that data generation process, maybe we won’t need humans ... I don’t tell this to my team.
But I don’t fear that future. It will be a future where human scientists will be left to do a lot of the thinking and the creative aspect of research, coming up with new hypothesis, combining concepts from different fields to create new fields. A lot of fun stuff.
Larry Bernstein:
Can you please end on a note of optimism.
Cesar de la Fuente:
We live in an incredible era. Today in my lab we can discover new things in a few minutes. This digital era of discovery was unimaginable even five years ago. Coming in the morning I know by lunchtime, I’m going to have a lot more molecules to play around with my team. Pushing the boundaries of knowledge to come up with molecules that help the world.
Larry Bernstein:
Thanks to Cesar for joining us.
If you missed our previous podcast, it was Why We Crave Stories.
Fritz Breithaupt was the speaker, and he is a humanities scholar as well as a cognitive scientist at UPenn where he is studying the relationship between narratives and empathy. Fritz is the author of a new book entitled The Narrative Brain: The Stories Our Neurons Tell.
Fritz spoke about how we experience fictional stories in our daydreams to achieve personal growth.
You can find our previous episodes and transcripts on our website. Please follow us on Apple Podcasts or Spotify.
I am Larry Bernstein with the podcast What Happens Next.
Check out our previous episode, Why We Crave Stories, here.


