This summer I worked with scientists on one of Octant’s drug-discovery programs. We set out to develop new computational approaches to predict how small-molecules bind to a protein target. Understanding this is a crucial step in the drug discovery process, and being able to predict that interaction can rapidly speed things up. Without an accurate model of the drug-target interactions, the process is a bit like searching for your keys in a pitch-black room. Obtaining a model of how the drug binds to its protein target is like turning the lights on: suddenly you know how to walk through the room to find what you are looking for. In this case, the team at Octant had just recently uncovered a number of really promising molecules using the company’s awesome high-throughput chemistry platform. Therefore, the goal was to build a model of how these molecules were bound to the protein target to inform further optimization of them.
A popular computational technique for predicting how a molecule binds to its protein target is molecular docking. These approaches search conformational space of the molecule to find an orientation that best “fits” the protein target by minimizing the energy of the pose according to an energy function. These methods, despite decades of development, still suffer from substantial accuracy issues. As a result, one has to thoroughly scrutinize the results, make tweaks to the scoring function, and leverage experimental data in order to construct the best hypothesis about which of the low-scoring docking poses is most accurate.
I started out by building a docking workflow that leveraged open-source tools. I tested out numerous programs, including recently developed methods that use machine learning-based scoring functions, but found the SMINA docking tool  performed the best on our target. By evaluating poses of docked molecules with our medicinal chemistry team, we saw numerous examples where the docking scoring functions seemed to produce highly-strained conformations of the ligand. These are obviously non-physiological and would lead to false positives. We were able to modify the workflow to constrain our series of compounds to more realistic low-energy conformations, which substantially helped improve the predictions.
After developing this workflow, we wanted to understand if some of the popular commercial options could further improve the predictive accuracy. To compare various docking programs, I built an evaluation benchmark leveraging the PDBbind Dataset, a curated subset of the Protein Data Bank that contains protein-ligand complexes along with binding affinity information. I spent significant effort refining this dataset, ultimately filtering out about half of the molecules for not having drug-like properties.
With the benchmark in hand, we evaluated SMINA, Schrodinger’s Glide program (both SP and XP), as well as OpenEye’s FRED program.This evaluation tested the ability of a docking program to correctly re-dock a ligand into a binding pocket that was based on an experimentally determined structure of the query ligand bound to the protein. As such, the protein is already “fixed” in its correct orientation and the docking program only needs to find the correct ligand orientation. Intriguingly, FRED performed best, followed by Glide XP.
Most use cases, however, do not already have a fixed structure of the ligand-bound protein and therefore cannot keep the protein rigid during the docking process. When evaluating docking programs on the more realistic scenario of predicting both the correct ligand and protein orientation, we found Schrodinger’s Induced-Fit Docking workflow to perform best.
With the evaluation complete, we began testing these docking programs on our in-house program. One of the exciting aspects of utilizing Octant’s high-throughput chemistry platform is that a single experiment provides an incredibly rich dataset. We attach thousands of fragments to a molecular core, which helps uncover how changes in the structure of the molecule affect its activity. Understanding these structure-activity relationships (SAR) is a crucial step in the discovery process because this allows medicinal chemists to better predict how modifications to a molecule will affect the molecule’s activity. Our hypothesis was that we could additionally leverage this rich SAR data to help better nail down the binding pose of the ligands. Intriguingly, none of the docking approaches that keep the protein fixed could recapitulate the results of our internal SAR studies except for Schrodinger’s Induced-Fit Docking program. In this case, two protein side-chains had to reorient from the crystal-structure pose that we were starting with in order to accomodate our series of ligands. Upon modeling this conformational change, we were able to use the experimental data to determine the correct binding poses from Schrodinger’s docking program. We then used the predictions from this docking study to design new modified molecules which are currently being synthesized and tested!
One of my favorite aspects of this project was the incredible mentorship I received both in terms of computational skills and software engineering as well as with chemistry and structural-biology. Beyond this project, Octant is doing really incredible and exciting things in so many different areas. When I joined Octant I knew about its deep expertise in synthetic biology and I was so grateful to get the opportunity to learn more about how it can be leveraged in the drug-discovery process. What I did not expect was to find such exciting boundary-pushing science happening at the level of chemistry and computation as well. The density of knowledge across these scientific disciplines is really amazing and I was able to grow and explore in biology, chemistry, data-analysis, and software engineering during the summer. Beyond just the science, the operations team and leadership at Octant have done a tremendous job of establishing a unique culture that really is the foundation on which all of the great science can flourish.
My favorite part of Octant is the incredible combination of synthetic biology, high-throughput chemistry, and computational modeling that is working together to produce a really impressive platform. This alone is truly special, and when you combine this with the amazingly kind, smart and creative people behind it all, it is hard to imagine a better place to work and grow.
1. Koes, David Ryan, Matthew P. Baumgartner, and Carlos J. Camacho. "Lessons learned in empirical scoring with smina from the CSAR 2011 benchmarking exercise." Journal of chemical information and modeling 53.8 (2013): 1893-1904.
About the author:
Hunter is entering the fourth year of his Ph.D. in Computational Biology at the University of California, Berkeley. He works in the Berkeley Artificial Intelligence Research Lab under the supervision of Dr. Jennifer Listgarten where he researches applications of machine learning to protein engineering and small-molecule drug discovery.