Pear Biotech Bench to Business: insights on generative AI in healthcare and biotech with Dr. James Zou

December 8, 2023

Perspectives

by Sarah Jones

Here at Pear, we specialize in backing companies at the pre-seed and seed stages, and we work closely with our founders to bring their breakthrough ideas, technologies, and businesses from 0 to 1. Because we are passionate about the journey from bench to business, we created this series to share stories from leaders in biotech and academia and to highlight the real-world impact of emerging life sciences research and technologies. This post was written by Pear Partner Eddie and Pear PhD Fellow Sarah Jones.

Today, we’re excited to share insights from our discussion with Dr. James Zou, Assistant Professor of Biomedical Data Science at Stanford University, who utilizes Artificial Intelligence (AI) and Machine Learning (ML) to improve clinical trial design, drug discovery, and large-scale data analysis. We’re so fortunate at Pear to have James serve as a Biotech Industry Advisor for us and for our portfolio companies.

James received his Ph.D. from Harvard in 2014 and was a Simons Research Fellow at UC Berkeley. Prior to accepting a position at Stanford, James worked at Microsoft and focused on statistical machine learning and computational genomics. At Stanford, his lab focuses on making new algorithms that are reliable and fair for a diverse range of applications. As the faculty director of the Stanford AI for Health program, James works across disciplines and actively collaborates with both academic labs and biotech and pharma.

If you prefer listening, here’s a link to the recording!

Key Takeaways:

1. James and his team employed generative AI not only to predict novel antibiotic compounds, but also to produce a facile ‘recipe’ for chemical synthesis, representing a new paradigm of drug discovery.

Antibiotic discovery is challenging for at least a couple of reasons: reimbursement strategies do not incentivize companies to pour resources, time and money into R&D, and antibiotic resistance has made it difficult to create lasting, efficacious products. To accelerate discovery and meet the critical need for new antibiotics, James and his team created a generative AI algorithm that could generate small molecules that were predicted to have high activity. Not only could the model generate chemical structures, but it could also produce the instructions for chemists to make these compounds. By streamlining the process from structure generation to synthesis, James and his team were able to identify a potent antibiotic for pathogens that have developed resistance to existing antibiotics.
These ‘recipes’ that the algorithm generated laid out step-by-step instructions for over 70 lead compounds. After synthesizing and testing the molecules, they found that they achieved a hit rate above 80% and could synthesize 58 novel compounds. Of these 58, they found that six were validated as promising drug candidates. In addition, the model prioritized hits that had robust synthetic protocols and were predicted to have low toxicity. In this way, the model could prioritize certain small molecule features and generate both novel structures and complete recipes.
The generative model in this case used a Monte Carlo Tree search to come up with recipes for new small molecules. The same logic flow and reasoning can easily be applied to other settings. For example, James and his team are working with Stanford spin-outs to apply the algorithm to other diseases such as fibrosis or for applications requiring new fluorescent molecules.

I think it’s a good, reasonable model for AI and biotech in general to have this close feedback loop, where we start off with some experimental data. In our case, we have some experimental screening data, the actual data used to train the models, and then the model will produce some candidates or some hypotheses. Then we tried to have a more rapid turnaround to do the additional experiments … to further validate the AI’s reasoning and thinking.

2. James’ group has also used generative AI to design clinical trials that aim to be faster, cheaper, more diverse, and more representative.

Clinical trials are a critical bottleneck in the pipeline of therapeutic development. Patient enrollment is slow and labor intensive, and trial design can be biased against underrepresented groups or may exclude patients left out based on criteria that don’t necessarily relate to trial outcome or patient response. To outline trial criteria, a ‘monstrosity’ of a document must be generated to explicitly lay out the rules that will be followed in patient recruitment.
James noted that it’s very hard to balance all of the complicated factors required for a successful trial design. With collaborators at Genentech, James and his team have worked to develop a Generative AI algorithm called Trial Pathfinder that uses historical clinical trials and outcomes to create an optimized trial design, often including a more diverse patient population. Among groups that see more representation and inclusion in AI-generated clinical trials are women, elderly patients, patients from underrepresented groups, and patients who might be a bit sicker. These patients often end up responding just as well without experiencing the predicted adverse events.
When James first started partnering with Genentech, his first goal was to understand the pain points in clinical trial design. He learned that many trials are actually very narrow and essentially recruit for the so-called “Olympic athletes” of patients. He noted one study actually did try to recruit Kobe Bryant and several Olympians. To make clinical trial outcomes more inclusive and representative of a diverse range of patients, one strategy is the use of algorithms that account for such biases.

If we look more closely at how people design the protocols for clinical trials, often it is based on domain knowledge. But it’s also often quite heuristic and anecdotal, which is why a lot of different teams from different pharma companies–even if they’re looking at drugs of similar mechanisms–often end up with quite different trials and trial designs.

3. AI/ML in biology is only possible because biology is becoming increasingly data driven.

The data that we can gather from biological systems is becoming more and more rich and diverse. For example, high resolution spatial transcriptomics or single molecule experiments can now be captured alongside more traditional measurements such as RNA sequencing. Perturbation of biological systems with CRISPR/Cas-9 technology also enables a whole new suite of data that represents an area ripe for AI.
Even in the past year, James noted that he has seen tremendous advances in large language models and foundation models on the AI side. It’s also been interesting to see how these advances have enabled new insights in the biotech space. For example, high-throughput perturbation data collected at the single-cell resolution is something that can be extremely compatible with large language models. So far, collaborative efforts between AI and biotech have been extremely fruitful.
As a professor, James and his lab are always pursuing new and exciting research directions. In particular, one project that he’s excited about is fueled by recent progress in spatial biology. As we’ve seen interest skyrocket in single-cell transcriptomics and genomics, researchers have generated huge amounts of data that have led to a variety of different findings and results. However, single cells also reside in different neighborhoods, or local microenvironments. Understanding disease and healthy states in the context of groups of cells may unlock even more insight into how patients may respond to therapies.
Another promising direction James highlighted was the use of large language models to allow researchers to synthesize and analyze information across biological databases. Currently, data is often siloed, and the level of expertise required to utilize individual data sets makes it challenging to work across fields and specialties.
“But this is where we think that language models can really be a unifying framework that can then help us to access data and integrate data from all these different modalities on different knowledge bases. So that’s another thing we’re working on with my students.”

This is why we’re excited about techniques and ideas like large language models that harness data from different modalities, databases, and knowledge bases to help biomedical researchers make faster innovations.

4. Landmark results at the intersection of biology and engineering, such as the Human Genome Project and the discovery of Yamanaka factors for reprogramming stem cells, motivated James to pursue a career in academia and to start his lab at Stanford.

James always gravitated towards math and science and started his academic path with an undergraduate major in math. Although he didn’t start out with a focus in biology, he began spending a good deal of time at the Broad Institute during his time at Harvard. At that time, many great biotechnologists were working on projects like sequencing the human genome. James began to learn more about interesting problems at the intersection of AI and biotech.
As he learned about the discovery of Yamanaka factors for reprogramming induced pluripotent stem cells (iPSCs), James was fascinated by the idea that you could take these ‘biological computer programs’ and with only a few instructions, change the state of a cell. Essentially, the biological system became not just something you could study, but something you could engineer.

5. Translation of academic projects needs to happen thoughtfully in collaboration with the right partners.

Translation of academic projects can take a few different forms, and not every project is right for translation. James gave a few examples of projects that he knew could move beyond the Stanford campus. One project involved the development of an AI system for assessing heart disease based on cardiac ultrasound videos, or echocardiograms. Millions of these videos are collected each year in the US alone, so it’s one of the most routine and accessible ways to assess cardiovascular disease.
James and his students developed an AI system to help look at these videos and determine outcomes to help clinicians make more accurate diagnoses. Not only did the work result in two publications in the prestigious scientific journal, Nature, but it progressed to clinical trials.
James explained that he clearly sees the larger impact of his work; he wants not only to publish really good papers, but also to work closely with biotech and pharma companies or tech companies to ensure his findings and algorithms can actually impact human health. His focus on translation has led to at least four companies that have been spun out of his group.
“At least one of the things we are doing is developing these algorithms or coming up with some of these potential drug candidates, and to really take it to the next level, either as a viable drug, a platform, or a device that we can take to patients, is something often beyond the scope of an individual PhD.”
James enjoys leveraging the community and resources of the Chan-Zuckerberg Biohub, a non-profit initiative aimed at bringing together interdisciplinary leaders to advance our ability to observe and analyze biological systems. Opportunities and communities such as these played a large role in drawing James to the Bay Area. He has actively sought out projects through which he can collaborate with biotech and pharma, investors, and the start-up community.

At that point, it’s really important work. [Getting FDA approval] also requires more resources. That’s where it makes sense for us to have a company that is co-founded by my students. So, for us, in this case, it’s a perfect synergy between doing the early-stage research and development, developing the algorithms to the initial validations and then having the company take over and do the submissions and then do the scaling.

Get to know Dr. James Zou: the person behind the science

James and his wife love to take advantage of all the great outdoors in the Bay Area. Every week, they go on a hike and spend a lot of time swimming or biking. One thing people may be surprised to learn about James is that he used to moonlight as a theater and restaurant reviewer. When he was living in Europe, he would write reviews of movies or restaurants for local English-language newspapers.

For someone wanting to pursue a similar career to his, James says that it’s a very exciting time to be at the intersection of biology and AI. He encourages students in the space to develop a core technical strength in either field and then begin to explore the synergy between disciplines.