PHILADELPHIA (January 16, 2023)— Roland L. Dunbrack Jr., PhD, director of the Molecular Modeling Facility at Fox Chase Cancer Center, recently announced the availability of a new database that contains clusters of homologous protein assembly structures observed in independent structure experiments found in the Protein Data Bank (PDB).
ProtCAD, the Protein Common Assembly Database, is a searchable database designed to help with one of the unsolved problems in structural biology: How to determine the correct assembly for proteins in crystallographic structures. Using this type of structural information can help researchers understand the behavior of proteins involved in cancer development, prevention, and treatment.
ProtCAD is the natural evolution of a previously released database called ProtCID (Protein Common Interface Database), which included data on the individual domain level of proteins. “ProtCAD takes that a step further. Instead of just the dimer, we are looking at the whole assembly,” Dunbrack said.
A protein dimer is a macromolecular complex formed by two single proteins; a tetrameric protein has four subunits. When searching ProtCAD, scientists will have a straightforward view of various possible assemblies and the experimental sources of those assemblies.
There are two purposes behind the development of ProtCAD, Dunbrack said. The first is to clean up the annotations in the PDB, which is comprised of about 87% crystal structures. The biologic assemblies of these structures are commonly defined by the authors of the structure. Unfortunately, the available annotations are estimated to be incorrect as often as 15% of the time, said Dunbrack, who is also a professor in the Cancer Signaling and Microenvironment research program at Fox Chase.
The second purpose of ProtCAD is to help generate new hypotheses regarding proteins. “Maybe we look at a particular protein family where we don’t know the dimer or the tetramer, or we are uncertain what the biological state or correct assembly is. Our work can show that there is only one dimer or tetramer observed over and over again,” Dunbrack said.
“This type of information can be important to know because there could be a mutation in a cancer cell that disrupts the formation of a trimer and prevents a protein from functioning properly or produces an overly active form of the protein.” ProtCAD (http://dunbrack2.fccc.edu/protcad) is searchable by PDB entry, UniProt identifiers, or Pfam domain designations. Downloads of coordinate files are available, as well as PyMol scripts and publicly available assembly annotations for each cluster of assemblies. ProtCAD, however, does not provide researchers with probabilities that the assemblies in any particular cluster are correct.
The database was announced in the paper, “The Protein Common Assembly Database (ProtCAD)–a Comprehensive Structural Resource of Protein Complexes,” which was published in Nucleic Acids Research.