Part of the weekly release procedure by the PDB is to publish the sequences of the entries to be released the following Wednesday four days earlier. This pre-release is scheduled every Saturday at 3:00 UTC. CAMEO collects the pre-release and, after some pre-processing of the sequences and filtering steps described below, submits a selected set of targets to the registered servers. Participants have until the following Wednesday at 03:00 (CET/CEST) to return their predictions. Once the reference structures have been released by the PDB the following Wednesday, the evaluation is performed.
The categories supported by CAMEO are the protein structure modeling (3D), protein model quality assessment (QE), and structures and complexes (Beta 3D). Protein contact prediction (CP) and ligand binding site (LB) have been discontinued.
CAMEO servers can be registered as public servers with its full name and results available to everyone, or as development servers, where the name is disguised ('serverX') and all scoring is performed and visible to other method developers, but not to the public. See our complete list of registered servers.
A CAMEO target is a pre-released PDB entry, which is submitted to registered servers. In CAMEO Structures & Complexes, a target consists of one or more peptide, protein, DNA or RNA sequence(s), and zero or more free ligands belonging to the same pre-released PDB entry. A target can, thus, be a monomer, a homo-oligomer or a hetero-oligomer, and contain ligand(s) or not.
CAMEO considers any pre-released sequence containing 30 or more amino acids to be a protein. Amino acid sequences strictly shorter than 30 residues are named peptides. Free ligands are small, non-polymer molecules that are pre-released as InChi codes and SMILES strings by the PDB.
CAMEO Structures & Complexes only submits complete targets to participating servers, that is targets that only contain types of sequences that the participant can model (protein, DNA, RNA and peptides). CAMEO will never submit part of heteromeric complex.
For instance, a server supporting heteromeric protein modeling will not receive RNA-protein complexes; similarly a server capable of modeling only single protein chains will not receive a heteromeric protein complex as target. The only exception to this is ligands: servers that cannot model ligands can still receive complexes containing ligands (just without the ligand information).
After downloading the pre-released sequences from the PDB on Saturday, in order to submit a limited number of high-quality targets for modeling, CAMEO Structures & Complexes performs the following actions before submitting the sequences to the participants:
CAMEO Complete Modeling only submits filtered nucleic and amino-acid sequences to the participants. The filtering step removes targets if any of their sequences:
In order to avoid "duplicate" submissions of very similar targets, CAMEO Structures & Complexes clusters the remaining targets.
First, CAMEO clusters individual polymer sequences from the targets:
Then, complexes are clustered based on the set of individual sequences they contain. Complexes that contain the exact same set of sequences are grouped together (clustered complexes).
Finally, a second level of clustering is added to the clustered complexes taking non-polymer ligands into account. Complexes in the same clustered complex are sub-divided into clustered complexes with ligands, each of them containing the exact same set of ligand.
Target complexes are first classified by difficulty into easy, medium and hard targets.
First, all protein sequences of 30 amino acids or more are searched separately for templates with BLAST against the full list of protein sequences currently in the PDB. Templates are classified into one of three categories:
Second, all nucleic acid sequences and peptide sequences shorter than 30 amino acids are subjected to a template search against all the sequences currently in the PDB. Templates are identified based on exact sequence identity. If a template has the exact same sequence in the PDB it is classified as "easy".
CAMEO Structures & Complexes uses the template information obtained from individual sequences and integrates it into a classification of whole complexes.
A template is considered for the complex only if it covers all the sequences of the target and has an a exhaustive (1:1) mapping between every sequence of the target and of the template. The template complex may contain no additional sequences not part the target (or not included in the mapping).
The complex difficulty is defined as follows:
We are working hard to implement many of the scores from the 3D category into CAMEO Structures & Complexes. We primarily consider superposition-free scores as CAMEO targets might consist of multiple domains or proteins that may not superpose well. So far the following scores are available:
The lDDT score (Local Distance Difference Test on All Atoms) evaluates the quality of the local atomic environment of a model. lDDT rewards the fraction of correctly predicted inter-atomic distances in a model at different threshold levels. lDDT does not depend on a global superposition of the prediction and target structure.
Specifically, interaction distances (cutoff 15 Å) between atoms in the reference protein structure are compared with distances between corresponding atoms in the predictions. If the difference between the two distances is within a defined threshold, the interaction is considered to be preserved in the prediction. The final lDDT-all score is computed by averaging the fraction of correctly modeled interactions for the following four distance difference thresholds: 0.5, 1, 2, and 4 Å (the same thresholds as GDT_HA). A filter based on the Engh and Huber bond lengths and angles removes stereochemical violations and steric clashes. CAMEO additionally offers a Cα - based lDDT score.[ref.: CASP9 TBM Assessment]
The QS-score considers the assembly interface as a whole and is suitable for comparing homo- or hetero-oligomers with identical or different stoichiometries, alternative relative orientations of chains, and distinct amino acid sequences (i.e. homologous complexes). To unequivocally identify the residues of all protein chains in complexes, QS-score first establishes a mapping between equivalent polypeptide chains of the compared structures by exploiting complex symmetries where possible. The resulting QS-score expresses the fraction of shared interface contacts (residues on different chains with a Cβ-Cβ distance < 12 Å) between two assemblies. A QS-score close to 1 translates to very similar interfaces, matching stoichiometry and a majority of identical interfacial contacts. A QS-score close to 0 indicates a radically diverse quaternary structure, probably different stoichiometries and potentially representing alternative binding conformations. Targets which cover only part of a hetero-oligomeric complex are not evaluated.
The RMSD score is the BiSyRMSD (symemtry-correct RMSD after binding site superposition) calculated by OpenStructure and as described in an upcoming publication.
For aggregation, we use the geometric mean, and apply a cap of 20Å. Missing ligands (either not modeled, or each additional ligand with a different stoichiometry than what was modeled) are assigned a value of 20Å. There is no penalty for extra ligands in the model at this point, however this may change in the future.
In addition, we compute success rates, where a success is defined as an RMSD lower or equal to a cutoff of 1, 2 and 5Å (num_rmsd_1, num_rmsd_2 and num_rmsd_5, respectively).
Finally, mean_best_rmsd
is the mean of the best prediction for each ligand entity (type).
This score doesn't penalize for wrong stoichiometries.
The lDDT-PLI score is the lDDT score of the receptor-ligand contacts, as calculated by OpenStructure and described in an upcoming publication.
For aggregation, we use the arithmetic mean. Missing ligands (either not modeled, or each additional ligand with a different stoichiometry than what was modeled) are assigned a value of 0. There is no penalty for extra ligands in the model at this point, however this may change in the future.
Finally, mean_best_lddt_pli
is the mean of the best prediction for each ligand entity (type).
This score doesn't penalize for wrong stoichiometries.
The lDDT-LP score is the lDDT score of the ligand pocket (LP), as calculated by OpenStructure and described in an upcoming publication. This is somewhat similar to the lDDT-BS, is is calculated with different parameters, and only when a ligand is present in the model.
For aggregation, we use the arithmetic mean. Missing ligands (either not modeled, or each additional ligand with a different stoichiometry than what was modeled) are assigned a value of 0. There is no penalty for extra ligands in the model at this point, however this may change in the future.
Finally, mean_best_lddt_lp
is the mean of the best prediction for each ligand entity (type).
This score doesn't penalize for wrong stoichiometries.
This section describes the requirements for a CAMEO Structures & Complexes server, both on how to receive submissions, and how to return predictions to CAMEO.
In order to receive submissions your server should be ready to:
The exact contents of the request depend on the capabilities of your server, and can be customized to some extent. Please use our format validation form to find out the exact request that your server will receive. Please fill in the form and the request contents will be shown in the Request Preview section at the very bottom of the form.
When your server completes the prediction, it should return the model(s) by email to the address that was submitted in the field you registered in the "Results Email (variable name)" field. The following formats are supported:
MODEL
and POSE
keywords.MODEL
and POSE
should contain a 1-based integer number of the model
or pose. Only models and poses 1-5 will be processed.LIGAND
keyword is mandatory to mark the beginning of a ligand block. However
the number and identification of the ligand are ignored in CAMEO.MODEL
keyword,
not the ordering of the attachment.PFRMAT
, TARGET
, AUTHOR
,
METHOD
and PARENT
) are be ignored.Here are some recommendations regarding the format of this email:
Note: if your server is registered for the CASP experiment, it should already fulfill most of these technical requirements and you can use the same technology for CAMEO.