PHILADELPHIA (February 5, 2020) –Roland L. Dunbrack Jr., PhD, director of the Molecular Modeling Facility at Fox Chase Cancer Center, published a study today in Nature Communications describing an expansion of his lab’s protein database. He believes the new data will help scientists access vast amounts of structural information that can be used to understand the properties of biological systems.
The researchers have added to the information available in their Protein Common Interface Database (ProtCID) to include data on the individual domain level, which greatly increases the number of large protein-protein clusters available in the database to researchers and scientists. Many proteins are composed of multiple, compact regions called domains, each of which may perform different functions.
“Proteins do most of the work in the human body … all the enzymes, all the structural proteins, they enable muscles to move, hormone signaling, help cells divide, copy DNA,” said Dunbrack. “They can only do that by interacting with other molecules: proteins with proteins, small molecules, and other DNA. Scientists need to know the structures of those interactions.”
An analogy for this mapping would be looking at the parts of a car, Dunbrack explained. A car could be taken apart and all of the parts listed and identified, but someone still needs to know how the parts fit together in order for the car to be put back together and for it to work.
In order to understand these interactions, it is often necessary to examine hundreds of structures of proteins available in the Protein Data Bank (PDB), a collaborative online effort involving a variety of institutions. However, this process can be difficult and time-consuming, Dunbrack said.
It is difficult for researchers—especially those who are not trained in structural bioinformatics—to access information on individual structures across all structures that are available on any one extensively studied protein or protein family, he added. In addition, the available annotations of the structures of protein assemblies in the PDB are estimated to be incorrect about 20 percent of the time, Dunbrack said.
“We wanted to come up with an approach that would help us fix the ones that were wrong and potentially generates new ideas,” he said.
ProtCID contains comprehensive, PDB-wide structural information on the proteins and individual protein domains with other molecules, including four types of interactions: chain interfaces, protein family domain interfaces, domain-peptide interfaces, and domain-ligand/nucleic acids interactions.
It also provides complete annotations of the members of each cluster for protein-protein interactions at the chain and domain levels, and for interactions of domains with peptides, ligands, and nucleic acids.
One of the most important aspects of the ProtCID website is that researchers can download the coordinates of structures and multiple structures at once, Dunbrack said. “On some sites that analyze protein interactions, if you want to get 50 structures related to a specific search, you would have to hit the download button 50 times,” he said. “We allow that to happen with one click.”
The paper, “ProtCID: A Data Resource for Structural Information on Protein Interactions,” appears in Nature Communications. The research was supported by a National Institutes of Health grant R35 GM122517.