Perspectives in AI with Kamil Rocki, Head of Performance Engineering at Stability AI

At Pear, we recently hosted a Perspectives in AI fireside chat with Kamil Rocki, Head of Performance Engineering at Stability AI. We discussed breakthroughs at the hardware-software interface that are powering generative AI. Kamil has extensive experience with GPU hardware and software programming from his PhD research and his work at IBM, Nvidia, Cerebras, Neuralink, and of course now StabilityAI. Read a recap of that conversation below:

Aparna: Kamil, thank you for joining us. You’ve accomplished many amazing things in your career, and we’re excited to hear your story. How did you choose your career path and what led you to work on the projects you’ve been involved with?

Kamil: My journey into the world of technology began in my 20s. After a few years of rigorous mathematical studies, I found myself in a robotics lab. I was tasked with enabling a robot to solve a Rubik’s cube. The challenge was to detect the cube’s location in an image captured by a camera, and this had to be done at a rate of 100 frames per second. 

I was intrigued by the work my peers were doing in computer graphics using Graphics Processing Units (GPUs). They were generating landscapes and waves, manipulating lighting, and everything was happening in real-time. This inspired me to use GPUs to process the images for my project.

The process was quite challenging. I had to learn OpenGL from my friends, write images to the GPU, apply a pixel shader, and then read data back from the GPU. Despite the complexity, I was able to exceed the initial goal and run the process at 200 frames per second. I even developed a primitive version of a neural network that could detect the cube’s location in the image.

In 2008, around the time I graduated, CUDA came out and there was a lot of excitement around GPUs. I wanted to continue exploring this field and heard about a supercomputer being built in Japan based on GPUs and ended up doing a PhD in supercomputing. During this time, I worked on an algorithm called Monte Carlo Tree Search, deploying it on a cluster of 256 GPUs. At that time, not many people were familiar with GPU programming, which eventually led me to the Bay Area and IBM Research in Almaden.

I spent five years at IBM Research, then moved to the startup world. I had learned how to build chips, design computer architecture, and build computers from scratch. I was able to go from understanding the physics of transistors to building a software stack on top of that, including an assembler, compiler, and programming what I had built. One of my goals at IBM was to develop a wafer scale system. This led me to Cerebras Systems, where I co-designed the hardware. Later I joined Neuralink and then Nvidia, where I worked on the Hopper architecture. I joined Stability, as we are currently in a transition to Hopper GPUs. There is a significant amount of performance work required, and with my extensive experience with this architecture, I am well-equipped to contribute to this transition.

Aparna:  GPUs have become one of the most profitable segments of the AI value chain, just looking at Nvidia’s growth and valuation. GPUs are also currently a capacity bottleneck. How did we arrive at this point? What did Nvidia, and others, do right or wrong to get us here?

Kamil: Nvidia’s journey to becoming a key player in the field of artificial intelligence is quite interesting. Initially, Nvidia was primarily known for its Graphics Processing Units (GPUs), which were used in the field of graphics. A basic primitive in graphics involves small matrix multiplication, used for rotating objects and performing various view projection transformations. People soon realized that these GPUs, efficient at matrix multiplications, could be applied to other domains where such operations were required.

In my early days at the Robotics Lab, I remember working with GPUs like the GeForce 6800 series. These were primarily designed for graphics, but I saw potential for other uses. I spent a considerable amount of time writing OpenGL code to set up the entire pipeline for simple image processing. This involved rasterization, vertex shader, pixel shader, frame buffer, and other complex processes. It was a challenging task to explore the potential of these GPUs beyond their conventional use.

Nvidia noticed that people were trying to use GPUs for general-purpose computing, not just for rendering images. In response, they developed CUDA, a parallel computing platform an application programming interface model. This platform significantly simplified the programming process. Tasks that previously required 500 lines of code could now be achieved with a program that resembled a simple C program. This opened up the world of GPU programming to a wider audience, making it more accessible and flexible.

Around 2011-12, the ImageNet moment occurred, and people realized the potential of scaling up with GPUs. Before this, CPUs were the primary choice for most computing tasks. However, the realization that GPUs could perform the same operations on different data sets significantly faster than CPUs led to a shift in preference. This was particularly impactful in the field of machine learning, where large amounts of data are processed using the same operations. GPUs proved to be highly efficient at performing these repetitive tasks.

This realization sparked a self-perpetuating cycle. As GPUs became more powerful, they were used more extensively in machine learning, leading to the development of more powerful models. Nvidia continued to innovate, introducing tensor cores that further enhanced machine learning capabilities. They were smart in making their products flexible, catering to multiple markets including graphics, machine learning, and high-performance computing (HPC). They supported FP64 computation, graphics, and tensor cores, which could be used for ray tracing and FP64. This adaptability and flexibility, combined with an accessible programming model, is what sets Nvidia apart in the field.

In the span of the last 15 years, from 2008 to the present, we have seen a multitude of different architectures emerge in the field of machine learning. Each of these architectures was designed to be flexible and adaptable, capable of being executed on a GPU. This flexibility is crucial as it allows for a wide range of operations, without being limited to any specific ones.

This approach also empowers users by not restricting them to pre-built libraries that can only run a single model. Instead, it provides them with the freedom to program as they see fit. For instance, if a user is proficient in C, they can utilize CUDA to write any machine learning model they desire.

However, some companies have lagged behind in this regard. Their mistake was in not providing users with the flexibility to do as they please. Instead, they pre-programmed their devices and assumed that certain architectures would remain relevant indefinitely. This is a flawed assumption. Machine learning architectures are continuously evolving, and this is a trend that I foresee continuing into the future.

Aparna: Could you elaborate more on the topic of special purpose chips for AI? Several companies, such as SambaNova Systems and Cerebras, have attempted to develop these. What, in your opinion, would be a successful architecture for such a chip? What would it take to build a competitive product in this field? Could you also shed some light on strategies that have not worked well, and those that could potentially succeed?

Kamil: Reflecting on my experience at Cerebras Systems, I believe one of the major missteps was the company’s focus on building specialized kernels for specific architectures. For instance, when ResNet was introduced, the team rushed to develop an architecture for it. The same happened with WaveNet and later, the Transformer model. At one point, out of 500 employees, 400 were kernel engineers, all working on specialized kernels for these architectures. The assumption was that these models were fixed and optimized, and users were simply expected to utilize our library without making any changes.

However, I believe this approach was flawed. It did not take into account the fact that architectures change frequently. Every day, new research papers are published, introducing new models and requiring changes to existing ones. Many companies, including Cerebras, failed to anticipate this. They were so focused on specific architectures that they did not consider the need for flexibility.

In contrast, I admire NVIDIA’s approach. They provide users with tools and allow them to program as they wish. This approach is more successful because it allows for adaptability. Despite the progress made by companies like Cerebras, Graphcore, and others, I believe too much time and effort is spent on developing prototypes of networks, rather than on creating tools that would allow users to do this work themselves.

Even now, I see companies building accelerators for the Transformer architecture. I would advise these companies to rethink their approach. They should aim for flexibility, ensuring that their architecture can accommodate changes. For instance, if we were to revert to recurrent nets in two years, their architecture should still be programmable.

Aparna: Thank you for your insights. Shifting gears, I’d like to talk about your work at Stability. It’s an impressive company with a thriving open-source community that consistently produces breakthroughs. We’ve observed the quality of the models and the possibilities with image generation. Many founders are creating companies using Stability’s models. So, my question is about the future of this technology. If a founder is building in this space and using your models as a foundation, where do you see this foundation heading? What’s the future of image generation technology at Stability?

Kamil: The potential of technology, particularly in the field of artificial intelligence, is immense. Currently, we’re seeing significant advancements in image generation models. The quality of these generated images is often astounding, sometimes creating visuals that are beyond reality, thereby accelerating creativity and content creation. We’re now extending this capability into 3D and video space. We’re actively working on models that can generate 3D scenes or objects and extend to video space. Imagine a scenario where you can generate a short clip of a dog running or even create an entire drama episode from a script.

We’re also developing audio models that can generate music. This can be combined with video generation to create a comprehensive multimedia experience. These applications have significant potential in the entertainment industry, from content generation for artists to the movie industry and game engine development.

However, I believe the real breakthrough will come when we move towards more industrial applications. If we can generate 3D representations and add video to that, we could potentially use this technology to simulate physical phenomena and accelerate R&D in the manufacturing space. For instance, generating an object that could be printed by a 3D printer. This could optimize and accelerate prototyping processes, potentially revolutionizing supply chains.

Recently, I was asked if a space rocket could be designed with generative AI. While it’s not currently feasible, the idea is intriguing and could potentially save a lot of money if we could solve complex problems using this technology.

In relation to hardware, I believe that generative AI and language models can be used to accelerate the discovery of new kinds of hardware and for generating code to optimize performance. With the increasing complexity and variety of models and architectures, traditional approaches to optimizing code and performance modeling are struggling. We need to develop more automated, data-driven approaches to tackle these challenges.

Aparna: You’ve broadened our understanding of the potential of generative AI. I’d like to delve deeper into the technical aspects. As the head of Performance Engineering at Stability, could you elaborate on the challenges involved in building systems that can generate video and potentially manufacture objects without error, performing exactly as intended?

Kamil: From a performance perspective, the issue of being limited by computational resources is closely related to the first question. At present, only a few companies can afford to innovate due to the high costs involved.  

This situation might actually be beneficial as it could spark creativity. The scarcity of resources, particularly GPUs, could trigger innovations on the algorithmic side. I recall a similar situation in the early days of computer science when people were predicting faster clock speeds as the solution to performance issues. It was only when they hit a physical limit that they realized the potential of parallelization, which completely changed the way people thought about performance.

Currently, the cost of building a data center for training state-of-the-art language models is approaching a billion dollars, not including the millions of dollars required for training. This is not a sustainable situation. I miss the days when I could run models and prototype things on a laptop.

One of the main problems we face is that we’ve allowed our models to become so large, assuming that compute infrastructure is infinite. These larger models are becoming slower because more time is spent on moving data around rather than on the actual computation. For instance, when I was at Nvidia, anything below 90% of the so-called ‘speed of light’ was considered bad. However, in many cases, large language models only utilize about 30-40% of the peak performance that you can achieve on a GPU. This means a lot of compute power is wasted.

People often overlook this issue. When I suggest optimizing the code on a single GPU and running it on a small model before scaling up, many prefer to simply run it on multiple GPUs to make it faster. This lack of attention to optimization is a significant concern.

Aparna: As we wrap up, I’d like to pose a final question related to your experience at Neuralink, a company focused on brain-to-robot interaction. This technology has potential applications in assisting differently-abled individuals. Could you share your perspective on this technology? When do you anticipate it will be ready, and what applications do you foresee?

Kamil: My experience at Neuralink was truly an exciting adventure. I had the opportunity to work with a diverse team of neuroscientists, physicists, and biologists, all of whom were well-versed in computing and programming. Despite the initial intimidation, I found my place in this team and contributed to some groundbreaking work.

One of the primary challenges we aimed to address at Neuralink was the communication barrier faced by individuals whose cognitive abilities were intact, but who were physically unable to express themselves. This issue is exemplified by renowned physicist Stephen Hawking, who could only communicate by typing messages very slowly using his eyes.

Our initial project involved training macaque monkeys to play a Pong game while simultaneously feeding data from their motor cortex. This allowed us to decode brain signals and enable the monkeys to control something on the screen. Although it may not seem directly related to human communication, this technology could potentially be used to control a cursor and type messages, thus bypassing physical limitations.

We managed to measure the information transfer rate from the brain to the machine in bits per second, achieving a rate comparable to that of people typing on their cell phones. This was a significant milestone and one of the first practical applications of our technology. It could potentially benefit individuals who are paralyzed due to spinal injuries, enabling them to communicate despite their physical limitations.

However, our work at Neuralink wasn’t limited to decoding brain signals and reading data. We also explored the possibility of stimulating brain tissue to induce physical movements or visual experiences. This bidirectional communication could potentially allow individuals to interact with computers more efficiently, bypassing the need for physical input devices. It could even pave the way for a future where VR goggles are obsolete, as we could stimulate the visual cortex directly. However, the safety of these techniques is still under investigation, and it’s crucial that we continue to prioritize this aspect as we push the boundaries of what’s possible.

There’s a significant spectrum of disorders that this technology could address, particularly for individuals who struggle with mobility or communication. We were also considering mental health issues such as depression, insomnia, and ADHD. One of the concepts we were exploring is the ability to read data from the brain, identify its state, and stimulate it. This could potentially serve as a substitute for medication or other forms of treatment.

However, it’s important to note that the technology, while progressing, is not entirely clear-cut. The safety aspect is crucial and cannot be ignored. At Neuralink, we’ve done a remarkable job ensuring that everything we develop is safe, especially considering these devices are implanted in someone’s head.

When we consider brain stimulation, we must also consider potential negative scenarios. For instance, if we stimulate a certain region of the brain to alleviate depression, we could inadvertently create a dependency, similar to injecting dopamine. This could potentially lead to a loop where the individual becomes addicted to the stimulation. It’s a complex issue that requires careful consideration and handling.

In addressing these challenges, we’ve engaged in extensive conversations with physicians, neuroscientists, and other experts. While some companies may have taken easier paths, potentially compromising safety, we’ve chosen a more cautious approach. Despite the slower progress, I can assure you that whatever we produce will be safe. This commitment to safety is something I find particularly impressive.

Kamil: For those interested, it’s worth noting that Neuralink is currently hiring. They’ve recently secured another round of funding and are actively seeking new talent. This is indeed a glimpse into the future of technology.

Aparna: Earlier, you mentioned an intriguing story about monkeys and reading their brainwaves. This story is related to the AI that’s been implanted in their brains and how it communicates. Could you elaborate on what happens with the models in this context?

Kamil: In our initial approach to decoding brain signals, we utilized a simple model. We had a vector of 1024 electrodes and our goal was to infer whether the monkey was attempting to move the cursor up, down, or click on something. We used static data from what we termed a pre-training session, which was essentially data recorded from the implant. The model was a two-layer perceptron, quite small, and could be trained in about 10 seconds. However, the brain’s signal distribution changes rapidly, so the model was only effective for about 10 to 15 minutes before we observed a degradation in performance. This necessitated the collection of new data and retraining of the model.

Recently, Neuralink has started exploring reinforcement learning-based approaches, which allow for on-the-fly identification and retraining of the model on the implant. During my time at Neuralink, my focus was primarily on the inference side. We trained the model outside the implant, and my role was to make the inference parts work on the implant. This was a significant achievement for us, as we were previously sending data out and back in. Given our battery limitations, performing tasks on the implant was more cost-effective. The ultimate goal was to move the entire training process to the implant.

Every day, our brains produce varying signals due to changes in our moods and environments. These factors could range from being in a noisy place, feeling tired, or engaging in different activities. This results in a constantly shifting distribution of brain signals, which presents a significant challenge. This phenomenon is not only applicable to the brain but also extends to other applications in the medical field.

Aparna: We’ve discussed a wide range of topics, from hardware design to image and video generation, and even brainwaves and implant technology. Thank you so much for these perspectives Kamil!

Thank you to Kamil for his perspectives on these exciting AI topics. To read more about Pear’s AI focus and previous Perspectives in AI talks, visit this page.

Looking back at PearX S19 alum Gradio’s journey, now part of Hugging Face


PearX S19 alum Gradio, which is now part of Hugging Face, has had a momentous few years. They recently launched Version 4.0 of their app and they have quickly become a leading workflow tool in the generative AI infrastructure space. We’re so proud of their success, and wanted to take a look back at the earliest days of the company, why we were excited to partner with Gradio’s founders from day 0, and some of their biggest milestones along the way.

How we met the team:

We first met Gradio’s founding team, Abubakar “Abu” Abid, Ali Abdalla, Ali Abid, and Dawood Khan through Pear’s Fellows program. They were housemates at Stanford at the time, and they came to us with an idea to speed up the process of collecting and labeling data for use with AI and ML. Put simply, they wanted to make it really simple for ML engineers to build and share computer vision models and ultimately to make more reliable models. 

Why we invested:

After meeting the team, we were excited to invest in Gradio from day 0 for a few key reasons:

  • The team: We knew this was the right team to tackle the space. Abu worked on this problems during his PhD in ML at Stanford. The founding team built deep technical products at companies like Tesla and Google. They also had an amazing advisor in Stanford Professor James Zou, who pioneered data valuation methods. 
  • The big market opportunity: Gradio was founded in 2019 when every company was on the precipice of becoming a data-driven or AI company. In 2019 alone, companies were spending $32 billion on data acquisition and labeling, and that number was slated to rise 50% year over year. This easily made this a multi-billion dollar market. 
  • The right product vision: We felt that Gradio could solve the biggest problems that data companies face. At that time, to build AI products, companies had to collect and manually label lots of data and then feed that data into machine learning algorithms and there was a crisis of poor data quality. It was a long and broken process that was ripe for innovation. Gradio’s product leveraged ML research to integrate with a company’s existing data pipeline to maximize the value of the data for ML. Essentially, they created the missing data valuation layer to maximize the potential of data for machine learning. 

How they evolved and what’s next:

They joined our PearX S19 cohort, and through the 14 week PearX cohort, they made huge leaps and bounds with their product. They ran four pilots with different kinds of natural language processing (speech or text) companies, ranging from legal contracts to financial records. Over the course of PearX, these smaller pilots led to landing bigger clients like Wells Fargo and TDBank. They used the learnings from this pilot to steadily expand to cover more areas of machine learning like video. 

Gradio founders meeting investors at Demo Day

At the end of PearX, Abu presented at our Demo Day to a room of tier one investors, and after Demo Day, Gradio successfully raised a seed round. Following the fundraise, our team continued working closely with Gradio’s team to find true product-market fit. This included exploring enterprise solutions across various verticals, which eventually led the team to pursue an Open Source approach to expedite product adoption. Gradio was open sourced, and it became the de facto tool for presenting AI / ML projects to a wide range of audiences. In the end, more than 300,000 demos were built using Gradio. 

In late 2021, they were acquired by Hugging Face. The Pear team partnered closely with Gradio’s leadership throughout the entire acquisition process. They are now a key pillar of Hugging Face, where they provide Hugging Face’s users, developers, and data scientists the tools needed to get high level results and create better models and tools. It’s a machine learning match made in heaven and together, they are building the future of ML. 

Gradio x Hugging Face

We’re excited for even more success from Gradio and will be cheering them on!

Pear Biotech Bench to Business: insights on generative AI in healthcare and biotech with Dr. James Zou

Here at Pear, we specialize in backing companies at the pre-seed and seed stages, and we work closely with our founders to bring their breakthrough ideas, technologies, and businesses from 0 to 1. Because we are passionate about the journey from bench to business, we created this series to share stories from leaders in biotech and academia and to highlight the real-world impact of emerging life sciences research and technologies. This post was written by Pear Partner Eddie and Pear PhD Fellow Sarah Jones.

Today, we’re excited to share insights from our discussion with Dr. James Zou, Assistant Professor of Biomedical Data Science at Stanford University, who utilizes Artificial Intelligence (AI) and Machine Learning (ML) to improve clinical trial design, drug discovery, and large-scale data analysis. We’re so fortunate at Pear to have James serve as a Biotech Industry Advisor for us and for our portfolio companies.

James received his Ph.D. from Harvard in 2014 and was a Simons Research Fellow at UC Berkeley. Prior to accepting a position at Stanford, James worked at Microsoft and focused on statistical machine learning and computational genomics. At Stanford, his lab focuses on making new algorithms that are reliable and fair for a diverse range of applications. As the faculty director of the Stanford AI for Health program, James works across disciplines and actively collaborates with both academic labs and biotech and pharma. 

If you prefer listening, here’s a link to the recording!

Key Takeaways:

1. James and his team employed generative AI not only to predict novel antibiotic compounds, but also to produce a facile ‘recipe’ for chemical synthesis, representing a new paradigm of drug discovery.

  • Antibiotic discovery is challenging for at least a couple of reasons: reimbursement strategies do not incentivize companies to pour resources, time and money into R&D, and antibiotic resistance has made it difficult to create lasting, efficacious products. To accelerate discovery and meet the critical need for new antibiotics, James and his team created a generative AI algorithm that could generate small molecules that were predicted to have high activity. Not only could the model generate chemical structures, but it could also produce the instructions for chemists to make these compounds. By streamlining the process from structure generation to synthesis, James and his team were able to identify a potent antibiotic for pathogens that have developed resistance to existing antibiotics. 
  • These ‘recipes’ that the algorithm generated laid out step-by-step instructions for over 70 lead compounds. After synthesizing and testing the molecules, they found that they achieved a hit rate above 80% and could synthesize 58 novel compounds. Of these 58, they found that six were validated as promising drug candidates. In addition, the model prioritized hits that had robust synthetic protocols and were predicted to have low toxicity. In this way, the model could prioritize certain small molecule features and generate both novel structures and complete recipes. 
  • The generative model in this case used a Monte Carlo Tree search to come up with recipes for new small molecules. The same logic flow and reasoning can easily be applied to other settings. For example, James and his team are working with Stanford spin-outs to apply the algorithm to other diseases such as fibrosis or for applications requiring new fluorescent molecules. 

I think it’s a good, reasonable model for AI and biotech in general to have this close feedback loop, where we start off with some experimental data. In our case, we have some experimental screening data, the actual data used to train the models, and then the model will produce some candidates or some hypotheses. Then we tried to have a more rapid turnaround to do the additional experiments … to further validate the AI’s reasoning and thinking.

2. James’ group has also used generative AI to design clinical trials that aim to be faster, cheaper, more diverse, and more representative.

  • Clinical trials are a critical bottleneck in the pipeline of therapeutic development. Patient enrollment is slow and labor intensive, and  trial design can be biased against underrepresented groups or may exclude patients left out based on criteria that don’t necessarily relate to trial outcome or patient response. To outline trial criteria, a ‘monstrosity’ of a document must be generated to explicitly lay out the rules that will be followed in patient recruitment. 
  • James noted that it’s very hard to balance all of the complicated factors required for a successful trial design. With collaborators at Genentech, James and his team have worked to develop a Generative AI algorithm called Trial Pathfinder that uses historical clinical trials and outcomes to create an optimized trial design, often including a more diverse patient population. Among groups that see more representation and inclusion in AI-generated clinical trials are women, elderly patients, patients from underrepresented groups, and patients who might be a bit sicker. These patients often end up responding just as well without experiencing the predicted adverse events. 
  • When James first started partnering with Genentech, his first goal was to understand the pain points in clinical trial design. He learned that many trials are actually very narrow and essentially recruit for the so-called “Olympic athletes” of patients. He noted one study actually did try to recruit Kobe Bryant and several Olympians. To make clinical trial outcomes more inclusive and representative of a diverse range of patients, one strategy is the use of algorithms that account for such biases. 

If we look more closely at how people design the protocols for clinical trials, often it is based on domain knowledge. But it’s also often quite heuristic and anecdotal, which is why a lot of different teams from different pharma companies–even if they’re looking at drugs of similar mechanisms–often end up with quite different trials and trial designs.

3. AI/ML in biology is only possible because biology is becoming increasingly data driven.

  • The data that we can gather from biological systems is becoming more and more rich and diverse. For example, high resolution spatial transcriptomics or single molecule experiments can now be captured alongside more traditional measurements such as RNA sequencing. Perturbation of biological systems with CRISPR/Cas-9 technology also enables a whole new suite of data that represents an area ripe for AI.
  • Even in the past year, James noted that he has seen tremendous advances in large language models and foundation models on the AI side. It’s also been interesting to see how these advances have enabled new insights in the biotech space. For example, high-throughput perturbation data collected at the single-cell resolution is something that can be extremely compatible with large language models. So far, collaborative efforts between AI and biotech have been extremely fruitful. 
  • As a professor, James and his lab are always pursuing new and exciting research directions. In particular, one project that he’s excited about is fueled by recent progress in spatial biology. As we’ve seen interest skyrocket in single-cell transcriptomics and genomics, researchers have generated huge amounts of data that have led to a variety of different findings and results. However, single cells also reside in different neighborhoods, or local microenvironments. Understanding disease and healthy states in the context of groups of cells may unlock even more insight into how patients may respond to therapies. 
  • Another promising direction James highlighted was the use of large language models to allow researchers to synthesize and analyze information across biological databases. Currently, data is often siloed, and the level of expertise required to utilize individual data sets makes it challenging to work across fields and specialties. 
  • “But this is where we think that language models can really be a unifying framework that can then help us to access data and integrate data from all these different modalities on different knowledge bases. So that’s another thing we’re working on with my students.”

This is why we’re excited about techniques and ideas like large language models that harness data from different modalities, databases, and knowledge bases to help biomedical researchers make faster innovations.

4. Landmark results at the intersection of biology and engineering, such as the Human Genome Project and the discovery of Yamanaka factors for reprogramming stem cells, motivated James to pursue a career in academia and to start his lab at Stanford.

  • James always gravitated towards math and science and started his academic path with an undergraduate major in math. Although he didn’t start out with a focus in biology, he began spending a good deal of time at the Broad Institute during his time at Harvard. At that time, many great biotechnologists were working on projects like sequencing the human genome. James began to learn more about interesting problems at the intersection of AI and biotech. 
  • As he learned about the discovery of Yamanaka factors for reprogramming induced pluripotent stem cells (iPSCs), James was fascinated by the idea that you could take these ‘biological computer programs’ and with only a few instructions, change the state of a cell. Essentially, the biological system became not just something you could study, but something you could engineer.

5. Translation of academic projects needs to happen thoughtfully in collaboration with the right partners.

  • Translation of academic projects can take a few different forms, and not every project is right for translation. James gave a few examples of projects that he knew could move beyond the Stanford campus. One project involved the development of an AI system for assessing heart disease based on cardiac ultrasound videos, or echocardiograms. Millions of these videos are collected each year in the US alone, so it’s one of the most routine and accessible ways to assess cardiovascular disease. 
  • James and his students developed an AI system to help look at these videos and determine outcomes to help clinicians make more accurate diagnoses. Not only did the work result in two publications in the prestigious scientific journal, Nature, but it progressed to clinical trials. 
  • James explained that he clearly sees the larger impact of his work; he wants not only to publish really good papers, but also to work closely with biotech and pharma companies or tech companies to ensure his findings and algorithms can actually impact human health. His focus on translation has led to at least four companies that have been spun out of his group.  
  • “At least one of the things we are doing is developing these algorithms or coming up with some of these potential drug candidates, and to really take it to the next level, either as a viable drug, a platform, or a device that we can take to patients, is something often beyond the scope of an individual PhD.”
  • James enjoys leveraging the community and resources of the Chan-Zuckerberg Biohub, a non-profit initiative aimed at bringing together interdisciplinary leaders to advance our ability to observe and analyze biological systems. Opportunities and communities such as these played a large role in drawing James to the Bay Area. He has actively sought out projects through which he can collaborate with biotech and pharma, investors, and the start-up community.

At that point, it’s really important work. [Getting FDA approval] also requires more resources. That’s where it makes sense for us to have a company that is co-founded by my students. So, for us, in this case, it’s a perfect synergy between doing the early-stage research and development, developing the algorithms to the initial validations and then having the company take over and do the submissions and then do the scaling.

Get to know Dr. James Zou: the person behind the science

James and his wife love to take advantage of all the great outdoors in the Bay Area. Every week, they go on a hike and spend a lot of time swimming or biking. One thing people may be surprised to learn about James is that he used to moonlight as a theater and restaurant reviewer. When he was living in Europe, he would write reviews of movies or restaurants for local English-language newspapers. 

For someone wanting to pursue a similar career to his, James says that it’s a very exciting time to be at the intersection of biology and AI. He encourages students in the space to develop a core technical strength in either field and then begin to explore the synergy between disciplines.