August 25, 2021

My Dream Internship on Machine Learning and Data Science

August 25, 2021

My Dream Internship on Machine Learning and Data Science

From the opportunity to tackle difficult and engaging problems to the dynamic, collaborative community of individuals, working at Octant has in many ways been a dream internship come true. I couldn’t have asked for a better company to spend my summer with.

As my internship at Octant comes to a close, I wanted to reflect on my brief but incredible time here.

At its core, Octant is a drug discovery company. Through the use of its multiplexed screening technology, Octant advances a new paradigm of therapeutics that aims to treat complex diseases via multiple targets and pathways. Given my interest in AI and prior experience with cheminformatics, I knew interning at Octant would be a great fit and allow me to tackle various challenging but interesting problems—and I was right.

Within the first few days, I was already working with raw data. As part of the Compute Team, I was broadly tasked with improving the accuracy of our reaction yield estimates, which is important for data quality control. When screening for drugs, knowing how much product formed from a reaction is important because it informs us about the compounds driving the signal.

Example of LC-MS data

Product yield can be measured through an analytical technique called liquid chromatography mass spectrometry (LC-MS), which separates components based on physical properties such as hydrophobicity or size and then records signal intensities for varying mass-to-charge ratios. The time it takes for molecules to elute through a chromatography column is known as the retention time. When inspecting LC-MS data for a specific analyte, we typically see a single peak in intensity across all retention times, which is then integrated to approximate the quantity of that compound.

Example scenario where a retention time prediction could help with peak-picking

However, the data isn't always as straightforward as having a single peak for each analyte. Rather, in some cases, we may see multiple peaks with varying areas. This can be caused by undesired co-occurring substances of the same mass. In such cases, knowing the expected retention time of a molecule can help select the correct peak to integrate and ultimately provide a more accurate estimate of the reaction yield. As a result, my project specifically focused on predicting the retention time of various molecules.

Over the course of two months, I fully immersed myself in the problem. To better understand the nuances and learn about existing approaches, I started by reading up on recent literature. In the weeks that followed, I aggregated and preprocessed retention time data from across 36 different in-house screens corresponding to over 4,000 compounds in preparation for model training. This involved analyzing, cleaning, and normalizing the data such that modeling efforts would be effective and produce meaningful results.

Model performance on some test data

As I honed the preprocessing pipeline to ensure high-quality training data, I also experimented with various models ranging from traditional machine learning models to state-of-the-art graph neural networks. After rigorous testing of different molecular featurizations and model architectures, we settled on a top-performing deep learning model that averaged an R2 of 0.81 and root-mean-square error of 0.076 on our test data. Lastly, to make obtaining retention time predictions as quick and easy as possible, I wrapped up the model in a simple package such that it could be used to inform peak selection in future screening runs.

Through this in-depth process, I gained hands-on experience with the full ML pipeline and picked up helpful software practices along the way. At the same time, I learned how valuable collaboration can be to overcome obstacles: more times than I can count, working with my mentors and talking to other scientists led to a deeper understanding of the data, improved model performance, and better next steps.

Rigorous “double-blind” pizza judging

As much as I enjoyed working on my project, there is so much more to Octant than just work. In many ways, Octant possesses traits typical of most startups—ambitious, resilient, and goal-driven. But what defines and distinguishes Octant is its diverse yet tight-knit collection of people. Together, Octonauts foster a welcoming, engaging, supportive, and fun culture that manifests everywhere from small day-to-day interactions to company-wide events. For instance, a favorite memory of mine stemmed from a small lunchtime debate over the best fast-food pizza that spontaneously turned into a pizza tasting competition in which competitors submitted their favorite chain to be judged by interns in a randomized blind taste test. Experiences like these are only just the tip of the iceberg for what it's like to be an Octonaut.

Little Caesars wins the contest!

From the opportunity to tackle difficult and engaging problems to the dynamic, collaborative community of individuals, working at Octant has in many ways been a dream internship come true. I couldn’t have asked for a better company to spend my summer with.

Posted by

Jonathan Yin

2021 Octant Summer Intern
Back to all Posts