PHILADELPHIA (July 16, 2020)—In a new article, researchers at Fox Chase Cancer Center outlined a new machine learning method that could help make virtual screening for drugs targeting cancer and other diseases more effective. Machine learning is a type of artificial intelligence that allows a system to learn new information in ways that are not specifically encoded by the programmers.
Yusuf Adeshina, PhD, who did his doctoral work at Fox Chase, was the lead author on the paper and led the development of the new machine learning method. Adeshina worked under the direction of John Karanicolas, PhD, a professor in the Molecular Therapeutics Research Program.
“Most targeted therapies for cancer are small molecules, and the way these work is by binding to specific proteins. One of the ways one can go about finding new starting points for drugs to target these vulnerabilities is to essentially use a computer to look through and find, from very large libraries, potential drugs that would fit with that protein of interest,” said Karanicolas.
He said that one of the major limitations of virtual screening, however, is false positives. Many potential drug candidates identified by computers tend to be ineffective when tested in the lab. Typically, only about 12 percent of the drug compound candidates that are predicted to bind to a specific protein end up working, he said. Additionally, many that do work bind very weakly with the targeted protein and require extensive subsequent optimization.
“Yusuf developed this machine learning classifier that we trained to distinguish between things that work and things that wouldn’t,” said Karanicolas. “We did this by showing it examples of real drugs bound to their protein targets and a series of very compelling decoys that we had generated.”
The model was tested by evaluating models of protein-drug complexes that the model had never seen before. The team then programmed the model to determine which complexes were real and which ones would not be effective. Karanicolas said the model outperformed every other previous approach they have used to determine which compounds should be tested in the lab.
Additionally, the team conducted a prospective experiment in which they chose a protein target, acetylcholinesterase, and asked the model to go through about 5 billion drug candidates.
“Using our machine learning approach, we chose the top 23 scoring compounds. These are compounds that have never before been synthesized by any chemist in history. We worked with a company who synthesized these compounds for us, and Yusuf tested them to find out whether they would really inhibit acetylcholinesterase,” said Karanicolas.
Not only did almost all of the compounds work, said Karanicolas, but among those picked, the best were over 10 times more potent than the average results of existing virtual screening results.
“For many different projects in my lab, we’re already trying to target different oncoproteins, and we’ve been using this tool because it works so well. In addition to that, it’s really a resource for the scientific community. Others can also use this so that their virtual screening approaches work better,” said Karanicolas.
He added that both of Adeshina’s set of decoys (D-COID) and the virtual screening by machine learning (vScreenML)—his method for classifying which output from the computer should be tested—are freely available to the research community.
The paper, “Machine Learning Classification Can Reduce False Positives in Structure-Based Virtual Screenings,” was published in the journal PNAS.