General Workflow

Part of the weekly release procedure by the PDB is to publish the sequences of the entries to be released the following Wednesday four days earlier. This pre-release is scheduled every Saturday at 3:00 UTC. CAMEO collects the pre-release and, after some pre-processing of the sequences and filtering steps described below, submits a selected set of targets to the registered servers. Participants have until the following Wednesday at 03:00 (CET/CEST) to return their predictions. Once the reference structures have been released by the PDB the following Wednesday, the evaluation is performed.

The categories currently supported by CAMEO are the protein structure modeling (3D) and protein model quality assessment (QE). The upcoming complete modeling (CM) is an supports evaluation of heteromers and will replace the current 3D category. Other categories might follow as requested by the community and subject to available funding.

CAMEO servers can be registered as public server with its full name or an anonymous server, where all scoring is performed and visible to other method developers, but not to the public. Here, only the name is anonymized ('serverx'). See our complete list of registered servers.

A CAMEO target is a pre-released PDB entry, which is submitted to registered servers. In CAMEO CM, a target consists of one or more protein, DNA or RNA sequence(s), and zero or more free ligands belonging to the same pre-released PDB entry. A target can, thus, be a monomer, a homo-oligomer or a hetero-oligomer, and contain ligand(s) or not.

Pre-processing

After downloading the pre-released sequences from the PDB on Saturday, in order to submit a limited number of high-quality targets for modeling, CAMEO CM performs the following actions before submitting the sequences to the participants:

  1. Filtering of the sequences
  2. Clustering of similar targets
  3. Filtering of targets that are too easy

1. Filtering of the sequences

CAMEO Complete Modeling only submits filtered nucleic and amino-acid sequences to the participants. The filtering step removes targets if any of their sequences:

2. Clustering of similar targets

In order to avoid "duplicate" submissions of very similar targets, CAMEO CM clusters the remaining targets according to the following method.

  1. All protein sequences of 30 amino acids or more are clustered with CD-HIT at 99% sequence identity.
  2. Targets with several sequences belonging to the same CD-HIT cluster are removed.
  3. Targets are labeled by the list of CD-HIT clusters their sequences belong to.
  4. Targets that have the same list of clusters constitute a "hetero cluster". Note that this is also the case for monomers and homo-oligomers, where the behaviour reverts to the separate CD-HIT cluster.
  5. One "representative" target is selected per "hetero cluster" (the first PDB ID appearing in the pre-release).

At this stage of the development of the CM category, DNA and RNA sequences are not clustered, which means that submissions with duplicate DNA or RNA targets can occur.

3. Filtering too easy targets

Targets that are believed to be too trivial to model are not submitted to the participating servers. All representative targets (after clustering) are assessed for difficulty. All sequences are searched separately for templates with BLAST against the full list of protein sequences currently in the PDB. A target too "trivial" to model would feature a template with 85% sequence identity or more, additionally:

A target is referred to be "too easy" if a trivial template covers all the sequences of the target and there is a exhaustive mapping between every sequence of the target and every sequence of the template.

This implies:

Only targets that passed filtering, constitute representatives in the clustering, and are not "too easy" are then submitted to the participating servers.

Note: currently, DNA and RNA sequences are not filtered for sequence similarity.

Scores

We are working hard to implement many of the scores from the 3D category into CAMEO CM. We primarily consider superposition-free scores as CAMEO targets might consist of multiple domains or proteins that may not superpose well. So far the following scores are available:

The lDDT score (Local Distance Difference Test on All Atoms) evaluates the quality of the local atomic environment of a model. lDDT rewards the fraction of correctly predicted inter-atomic distances in a model at different threshold levels. lDDT does not depend on a global superposition of the prediction and target structure.
Specifically, interaction distances (cutoff 15 Å) between atoms in the reference protein structure are compared with distances between corresponding atoms in the predictions. If the difference between the two distances is within a defined threshold, the interaction is considered to be preserved in the prediction. The final lDDT-all score is computed by averaging the fraction of correctly modeled interactions for the following four distance difference thresholds: 0.5, 1, 2, and 4 Å (the same thresholds as GDT_HA). A filter based on the Engh and Huber bond lengths and angles removes stereochemical violations and steric clashes. CAMEO additionally offers a Cα - based lDDT score.[ref.: CASP9 TBM Assessment]

The QS-score considers the assembly interface as a whole and is suitable for comparing homo- or hetero-oligomers with identical or different stoichiometries, alternative relative orientations of chains, and distinct amino acid sequences (i.e. homologous complexes). To unequivocally identify the residues of all protein chains in complexes, QS-score first establishes a mapping between equivalent polypeptide chains of the compared structures by exploiting complex symmetries where possible. The resulting QS-score expresses the fraction of shared interface contacts (residues on different chains with a Cβ-Cβ distance < 12 Å) between two assemblies. A QS-score close to 1 translates to very similar interfaces, matching stoichiometry and a majority of identical interfacial contacts. A QS-score close to 0 indicates a radically diverse quaternary structure, probably different stoichiometries and potentially representing alternative binding conformations. Targets which cover only part of a hetero-oligomeric complex are not evaluated.

Format Definition

This section describes the requirements for a CAMEO-CM server, both on how to receive submissions, and how to return predictions to CAMEO.

Submission

In order to receive submissions your server should be ready to:

The exact contents of the request depend on the capabilities of your server, and can be customized to some extent. Please use our format validation form to find out the exact request that your server will receive. Please fill in the form and the request contents will be shown in the Request Preview section at the very bottom of the form.

Prediction

When your server completes the prediction, it should return the model(s) by email to the address that was submitted in the field you registered in the "Results Email (variable name)" field.

Here are some recommendations regarding the format of this email:

Note: if your server is registered for the CASP experiment, it should already fulfill most of these technical requirements and you can use the same technology for CAMEO.