RCSB PDB: Search API Documentation

RCSB PDB Search API

Stay current with API announcements by subscribing to the RCSB PDB API mailing list:

Introduction

The Search API accepts HTTP GET or POST requests with JSON payloads. The base URI of search endpoint is https://search.rcsb.org/rcsbsearch/v2/query. In GET method, search request should be sent as a URL-encoded query string in json parameter: https://search.rcsb.org/rcsbsearch/v2/query?json={search-request}.

Query syntax for the {search-request} is detailed in the Query Language section of this guide. See Build Your Search section for general information on how to construct the {search-request} object.

The search API is designed to return only the identifiers of relevant hits (see Return Type section for more information on the identifiers types that can be requested) and additional metadata. See Response Body section for more information. If you need to extract information on released date, macromolecules, organisms, resolution, modified residues, ligands etc., you should use RCSB Data API: https://data.rcsb.org.

Build Your Search

A search request is a complete specification of what should be returned in a result set. The search request is represented as a JSON object. The building blocks of the request are:

Context Description
return_type Required. Specifies the type of the returned identifiers, e.g. entry, polymer entity, assembly, etc. See Return Type section for more information.
query Optional. Specifies the search expression. Can be omitted if, instead of IDs retrieval, facets or count operation should be performed. In this case the request must be configured via the request_options context.
request_options Optional. Controls various aspects of the search request including pagination, sorting, scoring and faceting. If omitted, the default parameters for sorting, scoring and pagination will be applied.
request_info Optional. Specifies an additional information about the query, e.g. query_id. It's an optional property and used internally at RCSB PDB for logging purposes. When query_id is sent with the search request, it will be included into the corresponding response object.
The query context may consist of two types of clauses:

The simplest query requires specifying only return_type parameter and query context. With unspecified parameters property in the query object, a query matches all documents, returning PDB IDs if the return_type property is set to "entry".

{
  "query": {
    "type": "terminal",
    "service": "text"
  },
  "return_type": "entry"
}
open in editortry it out

Refer to Examples section for more examples.

Search Services

The RCSB PDB Search API consolidates requests to heterogeneous search services. The list of available services is below:

Service Description
text Performs attribute searches against textual annotations associated with PDB structures. Refer to Structure Attributes Search page for a full list of annotations.
text_chem Performs attribute searches against textual annotations associated with PDB molecular definitions. Refer to Chemical Attributes Search page for a full list of annotations.
full_text Performs unstructured searches against textual annotations associated with PDB structures or molecular definitions. Unstructured search performs a full-text searches against multiple text attributes.
sequence This service employs the MMseqs2 software and performs fast sequence matching searches (BLAST-like) based on a user-provided FASTA sequence (with E-value or % Identity cutoffs). Following searches are available:
  • protein: search for protein sequences
  • dna: search for DNA sequences
  • rna: search for RNA sequences
seqmotif Performs short motif searches against nucleotide or protein sequences, using three different types of input format:
  • simple (e.g., CXCXXL)
  • prosite (e.g., C-X-C-X(2)-[LIVMYFWC])
  • regex (e.g., CXCX{2}[LIVMYFWC])
structure Performs fast searches matching a global 3D shape of assemblies or chains of a given entry (identified by PDB ID), in either strict (strict_shape_match) or relaxed (relaxed_shape_match) modes, using a BioZernike descriptor strategy.
strucmotif Performs structure motif searches on all available PDB structures.
chemical

Enables queries of small-molecule constituents of PDB structures, based on chemical formula and chemical structure. Both molecular formula and formula range searches are supported. Queries for matching and similar chemical structures can be performed using SMILES and InChI descriptors as search targets. Graph and chemical fingerprint searches are implemented using tools from the OpenEye Chemical Toolkit.

Descriptor Matching Criteria:

The following graph matching searches use a fingerprint prefilter so these are designed to find only similar molecules. These graph matching comparisons include:

  • graph-exact: atom type, formal charge, bond order, atom and bond chirality, aromatic assignment, valence degree, and atom hydrogen count are used as matching criteria for this search type. Graph matching is performed on the subset of molecules satisfying a fingerprint screening search. Results will include isomorphic and substructure matches within this screened subset.
  • graph-strict: atom type, formal charge, bond order, atom and bond chirality, aromatic assignment, ring membership, and valence degree are used as matching criteria for this search type. Graph matching is performed on the subset of molecules satisfying a fingerprint screening search. Results will include isomorphic and substructure matches within this screened subset.
  • graph-relaxed: atom type, formal charge and bond order are used as matching criteria for this search type. Graph matching is performed on the subset of molecules satisfying a fingerprint screening search. Results will include isomorphic and substructure matches within this screened subset.
  • graph-relaxed-stereo: atom type, formal charge, bond order, atom and bond chirality are used as matching criteria for this search type. Graph matching is performed on the subset of molecules satisfying a fingerprint screening search. Results will include isomorphic and substructure matches within this screened subset.
  • fingerprint-similarity: Tanimoto similarity is used as the matching criteria for molecular fingerprints. Matches include molecules with scores exceeding 0.6 for TREE type fingerprints or 0.9 for MACCS type fingerprints.

The following graph matching searches perform an exhaustive substructure search with no pre-screening. These substructure graph matching comparisons include:

  • sub-struct-graph-exact (atom type, formal charge, aromaticity, bond order, atom/bond stereochemistry, valence degree, atom hydrogen count)
  • sub-struct-graph-strict (atom type, formal charge, aromaticity, bond order, atom/bond stereochemistry, ring membership, valence degree)
  • sub-struct-graph-relaxed (atom type, formal charge, bond type)
  • sub-struct-graph-relaxed-stereo (atom type, formal charge, bond type, atom/bond stereochemistry)

Return Type

The search can return one of the following result types:

Type Description
entry Returns a list of PDB IDs.
assembly Returns a list of PDB IDs appended with assembly IDs in the format of a [pdb_id]-[assembly_id], corresponding to biological assemblies.
polymer_entity Returns a list of PDB IDs appended with entity IDs in the format of a [pdb_id]_[entity_id], corresponding to polymeric molecular entities.
non_polymer_entity Returns a list of PDB IDs appended with entity IDs in the format of a [pdb_id]_[entity_id], corresponding to non-polymeric entities (or ligands).
polymer_instance Returns a list of PDB IDs appended with asym IDs in the format of a [pdb_id].[asym_id], corresponding to instances of certain polymeric molecular entities, also known as chains. Note, that asym_id in the instance identifier corresponds to the _label_asym_id from the mmCIF schema (assigned by the PDB). It can differ from _auth_asym_id (selected by the author at the time of deposition).
mol_definition Returns a list of molecular definition identifiers that include:
  • Chemical component entries identified by the alphanumeric code, COMP ID: e.g. ATP, ZN
  • BIRD entries identified by BIRD ID, e.g. PRD_000154

Query Language

The Search API provides a full query DSL (domain-specific language) based on JSON to define queries.

Basic Search

The query language allows to perform unstructured (basic) searches. An unstructured query refers to the search of textual annotation associated with PDB structures when the field name is unknown. Such query will search across all fields, available for search, and return a hit if match happens in any field.

To perform an unstructured search, you should send the parameters object without an explicit attribute property:

{
  "query": {
    "type": "terminal",
    "service": "full_text",
    "parameters": {
      "value": "thymidine kinase"
    }
  },
  "return_type": "entry"
}
open in editortry it out

Refer to Examples section for more examples.

Complex boolean queries in the basic search can be built with following operators:

For example, using a interferon + response + factor query string is equivalent to running interferon AND response AND factor search.

{
  "query": {
    "type": "terminal",
    "service": "full_text",
    "parameters": {
      "value": "interferon + response + factor"
    }
  },
  "return_type": "entry"
}
open in editortry it out

You can use ( and ) to signify precedence. For example, searching with a query string isopeptide + ( collagen | fibrinogen ) returns structures that contain isopeptide AND either collagen OR fibrinogen.

{
  "query": {
    "type": "terminal",
    "service": "full_text",
    "parameters": {
      "value": "isopeptide + ( collagen | fibrinogen )"
    }
  },
  "return_type": "entry"
}
open in editortry it out

Attribute Search

Attribute query allows searching for terms with relation to a specific attribute. To perform an attribute search, you should send the parameters object with an explicit attribute property set to a field name, value property set to a search term, and operator property set to a search operator.

{
  "query": {
    "type": "terminal",
    "service": "text",
    "parameters": {
      "attribute": "exptl.method",
      "operator": "exact_match",
      "value": "ELECTRON MICROSCOPY"
    }
  },
  "return_type": "entry"
}
open in editortry it out

Refer to the Examples section for more examples.

When using attribute search, you must observe the following rules:

Negation

To perform negation on the operator, the negation property should be set to true in the query parameters object. The following search returns non-protein polymeric entities:

{
  "query": {
    "type": "terminal",
    "service": "text",
    "parameters": {
      "operator": "exact_match",
      "negation": true,
      "value": "Protein",
      "attribute": "entity_poly.rcsb_entity_polymer_type"
    }
  },
  "return_type": "polymer_entity"
}
open in editortry it out

Refer to the Examples section for more examples.

Case-Sensitive Search

By default, searches performed using exact match operators are case-insensitive. You can make your search case-sensitive by setting the case_sensitive property in the query parameters object to true. This option can be useful when capitalization rules help convey additional information. For example, gene symbols can differ in capitalization between homologous from different species, i.e. human genes are always upper case.

The following search returns human kinases encoded by the ABL1 gene. It excludes results where the case doesn't match, such as non-receptor tyrosine-protein kinase from mouse encoded by the Abl1 gene.

{
  "query": {
    "type": "terminal",
    "service": "text",
    "parameters": {
      "attribute": "rcsb_entity_source_organism.rcsb_gene_name.value",
      "operator": "exact_match",
      "value": "ABL1",
      "case_sensitive": true
    }
  },
  "return_type": "polymer_entity"
}
open in editortry it out

Refer to the Examples section for more examples.

Boolean Expressions

The query language supports two boolean operators: AND and OR. Boolean operators should be added to the group node as logical_operator property. The group nodes can be used to logically combine search expressions (terminal nodes) or other group nodes:

{
  "query": {
    "type": "group",
    "logical_operator": "and",
    "nodes": [
      {
        "type": "group",
        "logical_operator": "or",
        "nodes": [
          {
            "type": "terminal",
            "service": "text",
            "parameters": {
              "operator": "exact_match",
              "value": "Homo sapiens",
              "attribute": "rcsb_entity_source_organism.taxonomy_lineage.name"
            }
          },
          {
            "type": "terminal",
            "service": "text",
            "parameters": {
              "operator": "exact_match",
              "value": "Mus musculus",
              "attribute": "rcsb_entity_source_organism.taxonomy_lineage.name"
            }
          }
        ]
      },
      {
        "type": "terminal",
        "service": "text",
        "parameters": {
          "operator": "greater",
          "value": "2019-08-20",
          "attribute": "rcsb_accession_info.initial_release_date"
        }
      }
    ]
  },
  "return_type": "polymer_entity"
}
open in editortry it out

Refer to the Examples section for more examples.

Scoring Strategy

You can customize how scores from different services impact the final relevancy ranking of your search results by setting a scoring_strategy in the request_options context. Following scoring strategies are available: combined (default), sequence, seqmotif, strucmotif, structure, chemical, and text. For example, you might want to boost search results based on the relevance score produced by sequence search service, then sequence scoring strategy should be used.

The final relevancy score is calculated as weighted sum of normalized scores produced by different search services (all search result scores are rescaled to the interval [0, 1], 0 still means it met the criteria of the search). When combined strategy is used, equal weights are applied. For other strategies, higher weight is used for select service scores making their contribution to the final score bigger and therefore promoting ranking that is influenced by select service.

Sorting

Sorting is determined by the sort object in the request_options context. It allows you to add one or more sorting conditions to control the order of the search result hits. The sort operation is defined on a per field level, with special field name for score to sort by score (the default).

Structure Attributes Search and Chemical Attributes Search pages to find all searchable attributes. Any attribute listing exact_match or equals operators can be used for sorting.

By default sorting is done in descending order ("desc"). The sort can be reversed by setting direction property to "asc". This example demonstrates how to sort the search results by release date:

{
  "query": {
    "type": "terminal",
    "service": "text",
    "parameters": {
      "attribute": "struct.title",
      "operator": "contains_phrase",
      "value": "\"hiv protease\""
    }
  },
  "request_options": {
    "sort": [
      {
        "sort_by": "rcsb_accession_info.initial_release_date",
        "direction": "desc"
      }
    ]
  },
  "return_type": "entry"
}
open in editortry it out

Refer to the Examples section for more examples.

Pagination

By default, only first 10 hits are included in the search result list. Pagination can be configured by the start and rows parameters of the paginate object in the request_options context.

{
  "query": {
    "type": "terminal",
    "service": "text",
    "parameters": {
      "attribute": "rcsb_polymer_entity.formula_weight",
      "operator": "greater",
      "value": 500
    }
  },
  "request_options": {
    "paginate": {
      "start": 0,
      "rows": 100
    }
  },
  "return_type": "polymer_entity"
}
open in editortry it out

To retrieve all hits use the return_all_hits parameter in the request_options context. Please note that returning all hits is generally not desirable and may be the source of performance issues.

{
  "query": {
    "type": "terminal",
    "service": "text",
    "parameters": {
      "attribute": "rcsb_entry_info.selected_polymer_entity_types",
      "operator": "exact_match",
      "value": "Nucleic acid (only)"
    }
  },
  "request_options": {
    "return_all_hits": true
  },
  "return_type": "entry"
}
open in editortry it out

Refer to the Examples section for more examples.

Counting Results

By default, the search results contains a list of matched identifiers and additional metadata. See Search Results for more details. The return_counts flag in the request_options context allows you to execute a search query and get back only the number of matches for that query. The following query returns a number of current structures in the PDB archive:

{
  "query": {
    "type": "terminal",
    "service": "text"
  },
  "request_options": {
    "return_counts": true
  },
  "return_type": "entry"
}
open in editortry it out

Refer to the Examples section for more examples.

Include Computed Models

RCSB PDB has integrated Computed Structure Models from AlphaFold DB and ModelArchive. To include Computed Structure Models into your search results, add results_content_type parameter to the request_options context. This parameter allows to specify the content type filter that can include experimental, computational structures or both.

{
  "query": {
    "type": "terminal",
    "service": "text",
    "parameters": {
      "attribute": "rcsb_uniprot_protein.name.value",
      "operator": "exact_match",
      "value": "Free fatty acid receptor 2"
    }
  },
  "return_type": "entry",
  "request_options": {
    "results_content_type": [
      "computational",
      "experimental"
    ]
  }
}
open in editortry it out

Refer to the Examples section for more examples.

Faceted Queries

Faceted queries (or facets) provide you with the ability to group and perform calculations and statistics on PDB data by using a simple search query. Facets are the arrangement of search results into categories (buckets) based on the requested field values.

If the facets property is specified in the request_options context, the search results are presented along with numerical counts of how many matching IDs were found for each term requested in the facets. If the query context is omitted in the search request with facets specified, the response will contain only the facet counts.

This example calculates the breakdown by experimental method of PDB structures, released after 2019-08-20:

{
  "query": {
    "type": "terminal",
    "service": "text",
    "parameters": {
      "operator": "greater",
      "value": "2019-08-20",
      "attribute": "rcsb_accession_info.initial_release_date"
    }
  },
  "request_options": {
    "facets": [
      {
        "name": "Methods",
        "aggregation_type": "terms",
        "attribute": "exptl.method"
      }
    ]
  },
  "return_type": "entry"
}
open in editortry it out

By default, searches containing a faceted query return both search hits and aggregation results. To return only aggregation results, set rows to 0 in the pagination context:

{
  "query": {
    "type": "terminal",
    "service": "text",
    "parameters": {
      "operator": "greater",
      "value": "2019-08-20",
      "attribute": "rcsb_accession_info.initial_release_date"
    }
  },
  "request_options": {
    "paginate": {
      "rows": 0
    },
    "facets": [
      {
        "name": "Methods",
        "aggregation_type": "terms",
        "attribute": "exptl.method"
      }
    ]
  },
  "return_type": "entry"
}
open in editortry it out

Refer to Examples section for more examples.

Terms Facets

Terms faceting is a multi-bucket aggregation where buckets are dynamically built - one per unique value. For each bucket terms faceting counts the documents (entry, polymer_entity, etc.) that contain a given value in a given field. For example, you can run the terms aggregation on the field rcsb_primary_citation.rcsb_journal_abbrev which holds the abbreviated name of a journal associated with an entry. In return, we have buckets for each journal, each with their PDB entry counts.

You can specify a threshold value for a count associated with a bucket for that bucket to be returned. Use min_interval_population parameter, e.g. in this example only journals associated with at least 1000 entries are returned:

You can also control the returned number of buckets using max_num_intervals parameter (up to 65536 limit). Larger values of max_num_intervals use more memory to compute and, push the whole aggregation close to the limit. You’ll know you’ve gone too large if the request fails with a message about max_buckets.

{
  "request_options": {
    "paginate": {
      "rows": 0
    },
    "facets": [
      {
        "name": "Journals",
        "aggregation_type": "terms",
        "attribute": "rcsb_primary_citation.rcsb_journal_abbrev",
        "min_interval_population": 1000
      }
    ]
  },
  "return_type": "entry"
}
open in editortry it out

Refer to Examples section for more examples.

Histogram Facets

Histogram faceting is a multi-bucket aggregation that can be applied on numeric values. It builds fixed size (a.k.a. interval) buckets over the values. For example, for the rcsb_polymer_entity.formula_weight field that holds a formula mass (KDa) of the entity, we can configure this aggregation to build buckets with interval 50 KDa:

You can use the min_interval_population parameter to request buckets with a higher or equal count associated with it.

{
  "request_options": {
    "paginate": {
      "rows": 0
    },
    "facets": [
      {
        "name": "Formula Weight",
        "aggregation_type": "histogram",
        "attribute": "rcsb_polymer_entity.formula_weight",
        "interval": 50,
        "min_interval_population": 1
      }
    ]
  },
  "return_type": "polymer_entity"
}
open in editortry it out

Refer to Examples section for more examples.

Date Histogram Facets

This multi-bucket aggregation is similar to the histogram aggregation, but it can only be used with date values. Calendar-aware intervals are configured with the interval parameter. For example, we can configure this aggregation to build buckets with 1 year intervals:

{
  "request_options": {
    "paginate": {
      "rows": 0
    },
    "facets": [
      {
        "name": "Release Date",
        "aggregation_type": "date_histogram",
        "attribute": "rcsb_accession_info.initial_release_date",
        "interval": "year",
        "min_interval_population": 1
      }
    ]
  },
  "return_type": "entry"
}
open in editortry it out

Refer to Examples section for more examples.

Range Facets

A multi-bucket aggregation that enables the user to define a set of numeric ranges - each representing a bucket. Note that this aggregation includes the from value and excludes the to value for each range. Omitted from or to parameters creates a bucket with min or max boundaries. Example:

{
  "request_options": {
    "paginate": {
      "rows": 0
    },
    "facets": [
      {
        "name": "Resolution Combined",
        "aggregation_type": "range",
        "attribute": "rcsb_entry_info.resolution_combined",
        "ranges": [
          {
            "to": 2
          },
          {
            "from": 2,
            "to": 2.2
          },
          {
            "from": 2.2,
            "to": 2.4
          },
          {
            "from": 4.6
          }
        ]
      }
    ]
  },
  "return_type": "entry"
}
open in editortry it out

Refer to Examples section for more examples.

Date Range Facets

This multi-bucket aggregation is similar to the range aggregation but dedicated for date values. The main difference between this aggregation and the normal range aggregation is that the from and to values can be expressed in date math expressions. Example:

{
  "request_options": {
    "paginate": {
      "rows": 0
    },
    "facets": [
      {
        "name": "Release Date",
        "aggregation_type": "date_range",
        "attribute": "rcsb_accession_info.initial_release_date",
        "ranges": [
          {
            "to": "2020-06-01||-12M"
          },
          {
            "from": "2020-06-01",
            "to": "2020-06-01||+12M"
          },
          {
            "from": "2020-06-01||+12M"
          }
        ]
      }
    ]
  },
  "return_type": "entry"
}
open in editortry it out

Refer to Examples section for more examples.

Cardinality Facets

Cardinality faceting is single-value metrics aggregation that calculates a count of distinct values returned for a given field. For example, you can count unique source organism name assignments in the PDB archive:

{
  "request_options": {
    "paginate": {
      "rows": 0
    },
    "facets": [
      {
        "name": "Organism Names Count",
        "aggregation_type": "cardinality",
        "attribute": "rcsb_entity_source_organism.ncbi_scientific_name"
      }
    ]
  },
  "return_type": "entry"
}
open in editortry it out

Refer to Examples section for more examples.

Filter Facets

As its name suggests, the filter aggregation helps you filter documents that contribute to bucket count. In the example below, we are filtering only protein chains which adopt 2 different beta propeller arrangements according to the CATH classification:

{
  "request_options": {
    "paginate": {
      "rows": 0
    },
    "facets": [
      {
        "filter": {
          "type": "terminal",
          "service": "text",
          "parameters": {
            "operator": "exact_match",
            "attribute": "rcsb_polymer_instance_annotation.type",
            "value": "CATH"
          }
        },
        "facets": [
          {
            "filter": {
              "type": "terminal",
              "service": "text",
              "parameters": {
                "operator": "in",
                "value": [
                  "2.140.10.30",
                  "2.120.10.80"
                ],
                "attribute": "rcsb_polymer_instance_annotation.annotation_lineage.id"
              }
            },
            "facets": [
              {
                "name": "CATH Domains",
                "min_interval_population": 1,
                "attribute": "rcsb_polymer_instance_annotation.annotation_lineage.id",
                "aggregation_type": "terms"
              }
            ]
          }
        ]
      }
    ]
  },
  "return_type": "polymer_instance"
}
open in editortry it out

Refer to Examples section for more examples.

Multi-Dimensional Facets

Complex, multi-dimensional aggregations are possible as in the example below:

{
  "request_options": {
    "paginate": {
      "rows": 0
    },
    "facets": [
      {
        "name": "Experimental Method",
        "aggregation_type": "terms",
        "attribute": "rcsb_entry_info.experimental_method",
        "facets": [
          {
            "name": "Polymer Entity Types",
            "aggregation_type": "terms",
            "attribute": "rcsb_entry_info.selected_polymer_entity_types"
          },
          {
            "name": "Release Date",
            "aggregation_type": "date_histogram",
            "attribute": "rcsb_accession_info.initial_release_date",
            "interval": "year"
          }
        ]
      }
    ]
  },
  "return_type": "entry"
}
open in editortry it out

Refer to Examples section for more examples.

Search Operators

Search operators are commands that help you make your search more specific and focused. The following operators can be used to perform a field search:

Exact Match Operators

Exact match operators indicate that the input value should match a field value exactly (including whitespaces, special characters and case).

exact_match

You can use the exact_match operator to find exact occurrences of the input value. Comparisons with exact_match operator are case-insensitive by default. See the Case-Sensitive Search paragraph of the Attribute Search section to learn how to configure case-sensitive exact searches.

A single value input is required for this operator and must be a string.

in

The in operator allows you to specify multiple values in a single search expression. It returns results if any value in a list of input values matches. It can be used instead of multiple OR conditions. Comparisons with in operator are case-sensitive.

An input value is required for this operator and it must be a list of strings, numbers or dates.

Full-Text Operators

The full-text operators enable you to perform linguistic searches against text data by operating on words and phrases. The input text is analyzed before performing a search. The analysis includes following transformations:

The standard grammar based tokenization is used to break input text into tokens. Refer to the Unicode Text Segmentation documentation for more information on tokenization rules.

contains_words

The contains_words operator performs a full-text search by operating on words in provided text. After text is broken into tokens, more basic queries are constructed and OR boolean logic used to interpret the query. For example, "actin-binding protein" will be interpreted as "actin" OR "binding" OR "protein". The search will return results if any of these tokens match. This operator can match multiple tokens in any order.

A single value input is required for this operator and it must be a string.

contains_phrase

The contains_phrase operator performs a full-text search by operating on phrases. The operator will require the following criteria fulfilled in order to return results:

For example, "actin-binding protein" will be interpreted as "actin" AND "binding" AND "protein" occurring in a given order.

A single value input is required for this operator and it must be a string.

Comparison Operators

greater, less, greater_or_equal, less_or_equal, equals operators match fields whose values are larger, smaller, larger or equal, smaller or equal to the given input value.

A single value input is required for this operator and it must be a number or date.

Range Operator

The range operator can be used to match values within a provided range.

A single value input is required for this operator and it must be an object as follows:

{
  "from": "[number|date]",
  "include_lower": "[boolean]",
  "to": "[number|date]",
  "include_upper": "[boolean]"
}

By default, lower and upper bounds are excluded. They can be included by setting include_lower and include_upper to true respectively. An inclusive bound means that the boundary point itself is included in the range as well, while an exclusive bound means that the boundary point is not included in the range.

{
  "query": {
    "type": "terminal",
    "service": "text",
    "parameters": {
      "attribute": "rcsb_accession_info.initial_release_date",
      "operator": "range",
      "value": {
        "from": "2019-01-01",
        "to": "2019-06-30"
      }
    }
  },
  "return_type": "entry"
}
open in editortry it out

Refer to Examples section for more examples.

Exists Operator

The exists is a logical operator that allows you to check whether a given field contains any value. To be deemed as non-existent the value must be null or []. The following values will indicate the field does exist:

The operator doesn't require a value.

Date Math Expressions

Comparison and range operators support using date math expression. The expression starts with an "anchor" date, which can be: a) now or b) a date string (in the applicable format) ending with ||. The anchor can then be followed by a math expression, supporting + and -, e.g. "2020-06-01||-12M", "now-1w".

The units supported are:

Search Attributes

The attributes available for search include the annotations described by mmCIF dictionary, annotations coming from external resources and attributes added by RCSB PDB. Both internal additions to the mmCIF dictionary and external resources annotations are prefixed with rcsb_.

Refer to the Structure Attributes Search and Chemical Attributes Search pages for a full list of the attributes that are available for text searches.

Search Results

The HTTP Status 200 (OK) status code indicates that the search API request has been processed successfully and that server returns search results data. The response data is formatted in JSON and its structure is determined by parameters in the query. Query parameters can be used to structure the result set in the following ways:

Response Body

The search response body provides details about the search execution itself as well as an array of the individual search hits. Following information is available in the search results response body:

Name Description
query_id Required. Unique query ID assigned to the request or passed as a query parameter.
result_type Required. Specifies the granularity of the returned identifiers requested in the query. See Return Type.
total_count Required. The total number of matched identifiers.
explain_metadata Optional. Contains details on the query execution time (in milliseconds).
result_set Optional. Search results set is returned as PDB identifiers and accompanying metadata.
group_set Optional. Search results are returned as groups.
facets Optional. Facets array contains search facets for requested attributes.

An example of search response is shown below:

{
  "query_id": "ce0e1f8a-2a66-4e3f-8b8b-7ecdb1e3458d",
  "result_type": "entry",
  "total_count": 2,
  "result_set": [
    {
      "identifier": "2V01",
      "score": 0.719
    },
    {
      "identifier": "3CLN",
      "score": 0.813
    }
  ]
}

Results Set

Results set is an array of objects representing search hits. Each hit contains the matching identifier, score, and metadata produced by search services.

Result Identifiers

While a search query might include a large number of attributes, only the matching PDB identifiers, representing a desired level of granularity, are included in the result set. Following notation is used for PDB identifiers:

Relevancy Score

The final relevancy score is calculated as weighted sum of normalized scores produced by different search services. By default, scores from all services are weighted equally. See Scoring Strategy section for more details on how to configure scoring. The higher the score, the more relevant result hit is.

Service Metadata

Different search services produce different metadata and use different scoring metrics. Set the results verbosity level to verbose return the additional metadata and raw scores reported as described below:

Name Description
node_id Required. Distinct numeric ID is assigned to results produced by each search service.
original_score Required. The original (raw) score produced by a search service chosen as relevance score for this service. For example, the bit score of the alignment is chosen as raw relevance score for a sequence search service.
norm_score Required. The original score transformed onto a scale between 0 and 1 using min-max normalization algorithm (higher means more significant).
match_context Optional. Additional metadata produced by search services. Match context will be included only for select return types. For example, is sequence search was performed and polymer_entity is specified as return type, the results will include matching_context with additional metadata such as sequence identity, E-value, bit-score values and the residue boundary positions of the matching sequence. The matching_context will not be included if same search is performed, but the return type is set to entry or assembly.

The following snippet shows an example of search results for a query that combines 4 different search services. Here, the search results set contains one search hit at the granularity of PDB entry:

{
  "result_set": [
    {
      "identifier": "6W2A",
      "score": 1.998,
      "services": [
        {
          "service_type": "text",
          "nodes": [
            {
              "node_id": 21049,
              "original_score": 8.183,
              "norm_score": 1
            }
          ]
        },
        {
          "service_type": "sequence",
          "nodes": [
            {
              "node_id": 2819,
              "original_score": 330,
              "norm_score": 0.21
            }
          ]
        },
        {
          "service_type": "structure",
          "nodes": [
            {
              "node_id": 12876,
              "original_score": 60.825,
              "norm_score": 0.608
            }
          ]
        },
        {
          "service_type": "chemical",
          "nodes": [
            {
              "node_id": 11162,
              "original_score": 1,
              "norm_score": 1
            }
          ]
        }
      ]
    }
  ]
}

Results Verbosity Level

By default, search results are returned with additional metadata (see Search Results for more details). Results verbosity level can be adjusted by setting the results_verbosity parameter in the request_options context. The results' verbosity levels from the most verbose to the least are as follows:

{
  "query": {
    "type": "terminal",
    "service": "text",
    "parameters": {
      "operator": "equals",
      "value": 4,
      "attribute": "rcsb_entry_info.polymer_entity_count_RNA"
    }
  },
  "request_options": {
    "results_verbosity": "compact"
  },
  "return_type": "entry"
}
open in editortry it out

Empty Results

The HTTP Status 204 (No Content) status code indicates that the search API request has been processed successfully but no search hits were found.

Dealing with Redundancy

The PDB archive includes multiple structures of same molecule, providing snapshots of the structure, interactions, and functions of these particular molecules which leads to redundancy. For example, the same protein studied in different experimental conditions or with different ligands bound. This leads to data redundancy that may present some challenges in bioinformatics analyses. It is helpful to be able to remove redundancy and group search results as this helps ensuring that similar and homologous proteins that appear in high numbers in a set of results do not introduce undesirable biases. Also, as the size of the PDB continues to grow, reducing redundancy helps when one seeks to obtain smaller datasets of distinct representatives.

Redundancy occurs at many levels (such as the level of sequence or structure similarity), and different grouping methods can be applied to PDB data in order to provide a non-redundant view.

Group By Parameters

To enable results grouping, the group_by parameters must be defined in the request_options context. Different grouping methods are available for a given Return Type:

Return Type Grouping Options
entry
  • matching_deposit_group_id - grouping on the basis of common identifier for a group of entries deposited as a collection. Such entries enter the PDB archive via GroupDep system that allows for parallel deposition of 10s–100s of related structures (typically the same protein with different bound ligands).
polymer_entity
  • sequence_identity - grouping on the basis of protein sequence clusters that meet a predefined identity threshold. Six levels of sequence identity are defined: 100%, 95%, 90%, 70%, 50%, 30%. Mutual sequence identity is determined by MMseqs2 software.
  • matching_uniprot_accession - grouping on the basis of common UniProt accession. UniProtKB assigns a unique accession for each protein products encoded by one gene in a given species.

Group By Return Type

The group_by_return_type parameter in the request_options context controls the form in which the grouped results are returned. Following options are available:

Return Grouped Results

It can be useful to study the variability among similar (redundant) search hits. You can use the group_by parameters in combination with the group_by_return_type parameter set to groups to return results as groups of similar objects. Few examples are listed below:

Group By Sequence Identity

This example groups together identical human sequences from high-resolution (1.0-2.0Å) structures determined by X-ray crystallography. Among the resulting groups, there is a cluster of human glutathione transferases in complex with different substrates.

{
  "query": {
    "type": "group",
    "logical_operator": "and",
    "nodes": [
      {
        "type": "terminal",
        "service": "text",
        "parameters": {
          "operator": "exact_match",
          "value": "Homo sapiens",
          "attribute": "rcsb_entity_source_organism.taxonomy_lineage.name"
        }
      },
      {
        "type": "terminal",
        "service": "text",
        "parameters": {
          "attribute": "exptl.method",
          "operator": "exact_match",
          "value": "X-RAY DIFFRACTION"
        }
      },
      {
        "type": "terminal",
        "service": "text",
        "parameters": {
          "attribute": "rcsb_entry_info.resolution_combined",
          "operator": "range",
          "value": {
            "from": 1,
            "include_lower": true,
            "to": 2,
            "include_upper": true
          }
        }
      }
    ]
  },
  "request_options": {
    "results_verbosity": "minimal",
    "group_by": {
      "aggregation_method": "sequence_identity",
      "similarity_cutoff": 100,
      "ranking_criteria_type": {
        "sort_by": "entity_poly.rcsb_sample_sequence_length",
        "direction": "desc"
      }
    },
    "group_by_return_type": "groups"
  },
  "return_type": "polymer_entity"
}
open in editortry it out

Group By UniProt Accession

This example demonstrates how to use matching_uniprot_accession grouping to get distinct Spike protein S1 proteins released from the beginning of 2020 with. Here, all entities are represented by distinct groups of SARS-CoV, SARS-CoV-2 and Pangolin coronavirus spike proteins.

{
  "query": {
    "type": "group",
    "logical_operator": "and",
    "nodes": [
      {
        "type": "terminal",
        "service": "text",
        "parameters": {
          "attribute": "rcsb_polymer_entity.pdbx_description",
          "operator": "contains_phrase",
          "value": "Spike protein S1"
        }
      },
      {
        "type": "terminal",
        "service": "text",
        "parameters": {
          "attribute": "rcsb_accession_info.initial_release_date",
          "operator": "greater",
          "value": "2020-01-01"
        }
      }
    ]
  },
  "request_options": {
    "results_verbosity": "minimal",
    "group_by": {
      "aggregation_method": "matching_uniprot_accession",
      "ranking_criteria_type": {
        "sort_by": "coverage"
      }
    },
    "group_by_return_type": "groups"
  },
  "return_type": "polymer_entity"
}
open in editortry it out

Although it’s true that a search hit will only appear once within a grouped set of search hits, it’s important to note that in some cases multiple groups can contain the same search hit. For example, when results are grouped by the UniProt accession, chimeric entities will appear in multiple groups.

Remove Redundant Results

It can be useful to remove redundant search hits from your results. You can use the group_by parameters in combination with the group_by_return_type parameter set to representatives to return only a single representative from each of resulting groups. For example, you may want to remove similar sequences with specific levels of mutual sequence identity. Non-redundant result set will consist solely of representative search hits from the original redundant search results that satisfy given search constraints.

This example shows how to retrieve a set of polymer entities from protein-protein complexes with the following constraints:

{
  "query": {
    "type": "group",
    "logical_operator": "and",
    "nodes": [
      {
        "type": "terminal",
        "service": "text",
        "parameters": {
          "operator": "greater_or_equal",
          "value": 2,
          "attribute": "rcsb_assembly_info.polymer_entity_instance_count_protein"
        }
      },
      {
        "type": "terminal",
        "service": "text",
        "parameters": {
          "attribute": "rcsb_entry_info.selected_polymer_entity_types",
          "operator": "exact_match",
          "value": "Protein (only)"
        }
      },
      {
        "type": "terminal",
        "service": "text",
        "parameters": {
          "attribute": "exptl.method",
          "operator": "in",
          "value": [
            "X-RAY DIFFRACTION",
            "ELECTRON MICROSCOPY"
          ]
        }
      },
      {
        "type": "terminal",
        "service": "text",
        "parameters": {
          "operator": "less_or_equal",
          "value": 2,
          "attribute": "rcsb_entry_info.resolution_combined"
        }
      },
      {
        "type": "terminal",
        "service": "text",
        "parameters": {
          "operator": "less_or_equal",
          "value": 0.2,
          "attribute": "refine.ls_R_factor_obs"
        }
      }
    ]
  },
  "request_options": {
    "results_verbosity": "minimal",
    "group_by": {
      "aggregation_method": "sequence_identity",
      "similarity_cutoff": 30
    },
    "group_by_return_type": "representatives"
  },
  "return_type": "polymer_entity"
}
open in editortry it out

Group Members Ranking

Group members ranking is designed to order the search hits in each of the resulting groups to present most relevant, useful hits first so that you can more easily find what you’re looking for.

The ranking system is made up of a series of options:

For example, you can search for rhodopsins and rhodopsin-like proteins, request all proteins related by sharing at least 50% sequence identity to be grouped and order polymer entities within each group by sequence similarity score:

{
  "query": {
    "type": "terminal",
    "service": "sequence",
    "parameters": {
      "sequence_type": "protein",
      "value": "MNGTEGPNFYVPFSNKTGVVRSPFEAPQYYLAEPWQFSMLAAYMFLLIMLGFPINFLTLYVTVQHKKLRTPLNYILLNLAVADLFMVFGGFTTTLYTSLHGYFVFGPTGCNLEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGVAFTWVMALACAAPPLVGWSRYIPEGMQCSCGIDYYTPHEETNNESFVIYMFVVHFIIPLIVIFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMVIAFLICWLPYAGVAFYIFTHQGSDFGPIFMTIPAFFAKTSAVYNPVIYIMMNKQFRNCMVTTLCCGKNPLGDDEASTTVSKTETSQVAPA",
      "identity_cutoff": 0.3,
      "evalue_cutoff": 0.1
    }
  },
  "request_options": {
    "results_verbosity": "minimal",
    "group_by": {
      "aggregation_method": "sequence_identity",
      "similarity_cutoff": 50,
      "ranking_criteria_type": {
        "sort_by": "score",
        "direction": "asc"
      }
    },
    "group_by_return_type": "groups"
  },
  "return_type": "polymer_entity"
}
open in editortry it out

Examples of ranking options specific to aggregation method are detailed below:

Ranking Options For UniProt Groups

Faceting Upon Grouped Results

By default, facet counts are based upon the original query results, not the grouped results. This means that whether or not you turn grouping on for a query, the facet counts will be the same.

To return non-redundant facet counts the group_by_return_type parameter must be set to representatives.

Sorting Grouped Results

An important aspect is the way sorting interacts with grouping. By default, all groups are sorted based upon the number of search hits in the group (in descending order by default). You can reverse the order in which groups are sorted. Inside each group, the search hits are sorted based on the ranking score. The type of the ranking score is specified by the ranking_criteria_type parameter.

Another important difference is that multi-sort operations are not enabled for grouped results.

Paging Grouped Results

The Pagination section describes how the Search API uses rows parameter to determine how many search hits to return for a search query. When grouped results are requested, this parameter is putting a limit on how many groups to return. When using start parameter with grouped results, it controls paging through available groups. There is no paging through the results within a group, all search hits per group are returned.

Counting Grouped Results

The Counting Results section of this guide describes the parameter that allows returning only the total count of hits returned by the query. When using it with grouped results, it returns a total count of all resulting groups or representatives.

API Clients

Python

The rcsbsearchapi package provides a Python interface to the RCSB PDB Search API. You can use it to fetch lists of PDB IDs corresponding to advanced query searches. This package was originally developed by Spencer Bliven, and a new version is now being maintained by RCSB PDB on GitHub.

Examples

This section demonstrates how to use the RCSB PDB Search API to perform complex searches.

Biological Assembly Search

This query finds symmetric dimers having a twofold rotation with the DNA-binding domain of a heat-shock transcription factor.

{
  "query": {
    "type": "group",
    "logical_operator": "and",
    "nodes": [
      {
        "type": "terminal",
        "service": "text",
        "parameters": {
          "operator": "exact_match",
          "value": "C2",
          "attribute": "rcsb_struct_symmetry.symbol"
        }
      },
      {
        "type": "terminal",
        "service": "text",
        "parameters": {
          "operator": "exact_match",
          "value": "Global Symmetry",
          "attribute": "rcsb_struct_symmetry.kind"
        }
      },
      {
        "type": "terminal",
        "service": "full_text",
        "parameters": {
          "value": "\"heat-shock transcription factor\""
        }
      },
      {
        "type": "terminal",
        "service": "text",
        "parameters": {
          "operator": "greater_or_equal",
          "value": 1,
          "attribute": "rcsb_entry_info.polymer_entity_count_DNA"
        }
      }
    ]
  },
  "return_type": "assembly"
}
open in editortry it out

X-Ray Structures Search

This query finds PDB structures of virus's thymidine kinase with substrate/inhibitors, determined by X-ray crystallography at a resolution better than 2.5 Å.

{
  "query": {
    "type": "group",
    "logical_operator": "and",
    "nodes": [
      {
        "type": "terminal",
        "service": "full_text",
        "parameters": {
          "value": "\"thymidine kinase\""
        }
      },
      {
        "type": "terminal",
        "service": "text",
        "parameters": {
          "operator": "exact_match",
          "value": "Viruses",
          "attribute": "rcsb_entity_source_organism.taxonomy_lineage.name"
        }
      },
      {
        "type": "terminal",
        "service": "text",
        "parameters": {
          "operator": "exact_match",
          "value": "X-RAY DIFFRACTION",
          "attribute": "exptl.method"
        }
      },
      {
        "type": "terminal",
        "service": "text",
        "parameters": {
          "operator": "less_or_equal",
          "value": 2.5,
          "attribute": "rcsb_entry_info.resolution_combined"
        }
      },
      {
        "type": "terminal",
        "service": "text",
        "parameters": {
          "operator": "greater",
          "attribute": "rcsb_entry_info.nonpolymer_entity_count",
          "value": 0
        }
      }
    ]
  },
  "return_type": "entry"
}
open in editortry it out

Protein Sequence Search

In this example, using sequence search, we find macromolecular PDB entities that share 90% sequence identity with GTPase HRas protein from Gallus gallus (Chicken).

{
  "query": {
    "type": "terminal",
    "service": "sequence",
    "parameters": {
      "evalue_cutoff": 1,
      "identity_cutoff": 0.9,
      "sequence_type": "protein",
      "value": "MTEYKLVVVGAGGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVIDGETCLLDILDTAGQEEYSAMRDQYMRTGEGFLCVFAINNTKSFEDIHQYREQIKRVKDSDDVPMVLVGNKCDLPARTVETRQAQDLARSYGIPYIETSAKTRQGVEDAFYTLVREIRQHKLRKLNPPDESGPGCMNCKCVIS"
    }
  },
  "request_options": {
    "scoring_strategy": "sequence"
  },
  "return_type": "polymer_entity"
}
open in editortry it out

3D-shape Search

This example demonstrates how structure search can be used to find PDB structures of calmodulin with conformational changes upon Ca2+ binding. Calmodulin (CaM) protein has two homologous globular domains connected by a flexible linker. Ca2+ binding to each globular domain causes a change from a “closed” to an “open” conformation. This query finds calmodulin structures in “open” conformation.

As a structure query input parameter we will use the crystal structure of Ca2+-loaded calmodulin (PDB entry 1CLL). This query is combined with the text search for CA chemical component ID. Note: if you leave out the query clause matching Ca2+ ions, you will also get calmodulin structures in complex with other metals (e.g. strontium in 4BW7).

{
  "query": {
    "type": "group",
    "logical_operator": "and",
    "nodes": [
      {
        "type": "terminal",
        "service": "text_chem",
        "parameters": {
          "operator": "exact_match",
          "value": "CA",
          "attribute": "rcsb_chem_comp_container_identifiers.comp_id"
        }
      },
      {
        "type": "terminal",
        "service": "structure",
        "parameters": {
          "value": {
            "entry_id": "1CLL",
            "assembly_id": "1"
          },
          "operator": "strict_shape_match"
        }
      }
    ]
  },
  "return_type": "entry"
}
open in editortry it out

Free Ligand Search

Ligands are considered “free ligands” when they interact non-covalently with macromolecules. This example shows how to find non-polymeric entities of ATP molecule that is found as “free ligand”.

{
  "query": {
    "type": "group",
    "logical_operator": "and",
    "nodes": [
      {
        "type": "terminal",
        "service": "text",
        "parameters": {
          "attribute": "rcsb_nonpolymer_instance_annotation.comp_id",
          "operator": "exact_match",
          "value": "ATP"
        }
      },
      {
        "type": "terminal",
        "service": "text",
        "parameters": {
          "attribute": "rcsb_nonpolymer_instance_annotation.type",
          "operator": "exact_match",
          "value": "HAS_NO_COVALENT_LINKAGE"
        }
      }
    ]
  },
  "return_type": "non_polymer_entity",
  "request_options": {
    "results_verbosity": "compact"
  }
}
open in editortry it out

Sequence Motif Search

A sequence motif search finds macromolecular PDB entities that contain a specific sequence motif. This examples retrieves occurrences of the His2/Cys2 Zinc Finger DNA-binding domain as represented by its PROSITE signature.

{
  "query": {
    "type": "terminal",
    "service": "seqmotif",
    "parameters": {
      "value": "C-x(2,4)-C-x(3)-[LIVMFYWC]-x(8)-H-x(3,5)-H.",
      "pattern_type": "prosite",
      "sequence_type": "protein"
    }
  },
  "return_type": "polymer_entity"
}
open in editortry it out

Chemical Similarity Search

This example demonstrates how to find molecular definitions chemically similar to Tylenol defined by the InChI string. Note, that the parameter match_type="graph-strict" does not imply exact structure match and you are getting acetaminophen molecules (TYL) together with methoxy (T9V) and ethoxy (N4E) analogs in the result set.

{
  "query": {
    "type": "terminal",
    "service": "chemical",
    "parameters": {
      "value": "InChI=1S/C8H9NO2/c1-6(10)9-7-2-4-8(11)5-3-7/h2-5,11H,1H3,(H,9,10)",
      "type": "descriptor",
      "descriptor_type": "InChI",
      "match_type": "graph-strict"
    }
  },
  "return_type": "mol_definition"
}
open in editortry it out

Search by UniProt Accession

This example shows how to search for PDB entities using associated UniProt accession code.

{
  "query": {
    "type": "group",
    "logical_operator": "and",
    "nodes": [
      {
        "type": "terminal",
        "service": "text",
        "parameters": {
          "operator": "exact_match",
          "value": "P69905",
          "attribute": "rcsb_polymer_entity_container_identifiers.reference_sequence_identifiers.database_accession"
        }
      },
      {
        "type": "terminal",
        "service": "text",
        "parameters": {
          "operator": "exact_match",
          "value": "UniProt",
          "attribute": "rcsb_polymer_entity_container_identifiers.reference_sequence_identifiers.database_name"
        }
      }
    ]
  },
  "return_type": "polymer_entity"
}
open in editortry it out

Structure Motif Search

A structure motif search finds macromolecular PDB assemblies that contain a specific arrangement of a small number of residues in a certain geometric arrangement (e.g. residue that constitute the catalytic center or a binding site). This examples retrieves occurrences of the enolase superfamily, a group of proteins diverse in sequence and structure that are all capable of abstracting a proton from a carboxylic acid. Position-specific exchanges are crucial to represent this superfamily accurately.

{
  "query": {
    "type": "terminal",
    "service": "strucmotif",
    "parameters": {
      "value": {
        "entry_id": "2mnr",
        "residue_ids": [
          {
            "label_asym_id": "A",
            "label_seq_id": 162
          },
          {
            "label_asym_id": "A",
            "label_seq_id": 193
          },
          {
            "label_asym_id": "A",
            "label_seq_id": 219
          },
          {
            "label_asym_id": "A",
            "label_seq_id": 245
          },
          {
            "label_asym_id": "A",
            "label_seq_id": 295
          }
        ]
      },
      "rmsd_cutoff": 2,
      "exchanges": [
        {
          "residue_id": {
            "label_asym_id": "A",
            "label_seq_id": 162
          },
          "allowed": [
            "LYS",
            "HIS"
          ]
        },
        {
          "residue_id": {
            "label_asym_id": "A",
            "label_seq_id": 245
          },
          "allowed": [
            "GLU",
            "ASP",
            "ASN"
          ]
        },
        {
          "residue_id": {
            "label_asym_id": "A",
            "label_seq_id": 295
          },
          "allowed": [
            "HIS",
            "LYS"
          ]
        }
      ]
    }
  },
  "return_type": "assembly"
}
open in editortry it out

Combining Search Services

This example shows how to compose text, sequence, structure, and chemical queries employing the Boolean operator AND. The search yields structures (entries) matching all criteria, including co-crystal structures with the desired bound inhibitor, matching the SMILES string for a small-molecule inhibitor designated 7J (QYS).

{
  "query": {
    "type": "group",
    "logical_operator": "and",
    "nodes": [
      {
        "type": "terminal",
        "service": "text",
        "parameters": {
          "operator": "exact_match",
          "value": "Coronaviridae",
          "attribute": "rcsb_entity_source_organism.taxonomy_lineage.name"
        }
      },
      {
        "type": "terminal",
        "service": "sequence",
        "parameters": {
          "evalue_cutoff": 1,
          "identity_cutoff": 0.5,
          "sequence_type": "protein",
          "value": "SLSGFRKMAFPSGKVEGCMVQVTCGTTTLNGLWLDDTVYCPRHVICTAEDMLNPNYEDLLIRKSNHSFLVQAGNVQLRVIGHSMQNCLLRLKVDTSNPKTPKYKFVRIQPGQTFSVLACYNGSPSGVYQCAMRPNHTIKGSFLNGSCGSVGFNIDYDCVSFCYMHHMELPTGVHAGTDLEGKFYGPFVDRQTAQAAGTDTTITLNVLAWLYAAVINGDRWFLNRFTTTLNDFNLVAMKYNYEPLTQDHVDILGPLSAQTGIAVLDMCAALKELLQNGMNGRTILGSTILEDEFTPFDVVRQCSGVTEG"
        }
      },
      {
        "type": "terminal",
        "service": "structure",
        "parameters": {
          "value": {
            "entry_id": "6LU7",
            "assembly_id": "1"
          },
          "operator": "relaxed_shape_match"
        }
      },
      {
        "type": "terminal",
        "service": "chemical",
        "parameters": {
          "value": "CC(C)C[C@H](NC(=O)OCC1CCC(F)(F)CC1)C(=O)N[C@@H](C[C@@H]2CCNC2=O)[C@@H](O)[S](O)(=O)=O",
          "type": "descriptor",
          "descriptor_type": "SMILES",
          "match_type": "graph-relaxed-stereo"
        }
      }
    ]
  },
  "return_type": "entry"
}
open in editortry it out

Sequence Cluster Statistics

This example shows how to get the number of distinct protein sequences in the PDB archive.

{
  "request_options": {
    "facets": [
      {
        "filter": {
          "type": "group",
          "logical_operator": "and",
          "nodes": [
            {
              "type": "terminal",
              "service": "text",
              "parameters": {
                "operator": "exact_match",
                "attribute": "rcsb_polymer_entity_group_membership.aggregation_method",
                "value": "sequence_identity"
              }
            },
            {
              "type": "terminal",
              "service": "text",
              "parameters": {
                "operator": "equals",
                "attribute": "rcsb_polymer_entity_group_membership.similarity_cutoff",
                "value": 100
              }
            }
          ]
        },
        "facets": [
          {
            "name": "Distinct Protein Sequence Count",
            "aggregation_type": "cardinality",
            "attribute": "rcsb_polymer_entity_group_membership.group_id"
          }
        ]
      }
    ],
    "paginate": {
      "start": 0,
      "rows": 0
    }
  },
  "return_type": "polymer_entity"
}
open in editortry it out

Newly Released Structures

This example shows how to get a list of all PDB ID for this week's newly released structures.

{
  "query": {
    "type": "terminal",
    "service": "text",
    "parameters": {
      "attribute": "rcsb_accession_info.initial_release_date",
      "operator": "greater",
      "value": "now-1w"
    }
  },
  "request_options": {
    "return_all_hits": true
  },
  "return_type": "entry"
}
open in editortry it out

Membrane Proteins

This example shows how to get a list of PDB ID of entries that are annotated as membrane protein by at least one relevant external resource.

{
  "query": {
    "type": "group",
    "logical_operator": "or",
    "nodes": [
      {
        "type": "terminal",
        "service": "text",
        "parameters": {
          "attribute": "rcsb_polymer_entity_annotation.type",
          "operator": "exact_match",
          "value": "PDBTM"
        }
      },
      {
        "type": "terminal",
        "service": "text",
        "parameters": {
          "attribute": "rcsb_polymer_entity_annotation.type",
          "operator": "exact_match",
          "value": "MemProtMD"
        }
      },
      {
        "type": "terminal",
        "service": "text",
        "parameters": {
          "attribute": "rcsb_polymer_entity_annotation.type",
          "operator": "exact_match",
          "value": "OPM"
        }
      },
      {
        "type": "terminal",
        "service": "text",
        "parameters": {
          "attribute": "rcsb_polymer_entity_annotation.type",
          "operator": "exact_match",
          "value": "mpstruc"
        }
      }
    ]
  },
  "return_type": "entry"
}
open in editortry it out

Symmetry and Enzyme Classification

This example shows how to get assembly counts per symmetry types, further broken down by Enzyme Classification (EC) classes. The assemblies are first filtered to homo-oligomers only.

{
  "query": {
    "type": "group",
    "logical_operator": "and",
    "nodes": [
      {
        "type": "terminal",
        "service": "text",
        "parameters": {
          "attribute": "rcsb_assembly_info.polymer_entity_count",
          "operator": "equals",
          "value": 1
        }
      },
      {
        "type": "terminal",
        "service": "text",
        "parameters": {
          "attribute": "rcsb_assembly_info.polymer_entity_instance_count",
          "operator": "greater",
          "value": 1
        }
      }
    ]
  },
  "request_options": {
    "facets": [
      {
        "filter": {
          "type": "terminal",
          "service": "text",
          "parameters": {
            "attribute": "rcsb_struct_symmetry.kind",
            "operator": "exact_match",
            "value": "Global Symmetry"
          }
        },
        "facets": [
          {
            "aggregation_type": "terms",
            "name": "sym_symbol_terms",
            "attribute": "rcsb_struct_symmetry.symbol",
            "facets": [
              {
                "aggregation_type": "terms",
                "name": "ec_terms",
                "attribute": "rcsb_polymer_entity.rcsb_ec_lineage.id"
              }
            ]
          }
        ]
      }
    ],
    "paginate": {
      "start": 0,
      "rows": 0
    }
  },
  "return_type": "assembly"
}
open in editortry it out

Computed Structure Models

This example shows how to find PDB structures and Computed Structure Models for a given UniProt sequence.

{
  "query": {
    "type": "group",
    "logical_operator": "and",
    "nodes": [
      {
        "type": "terminal",
        "service": "text",
        "parameters": {
          "attribute": "rcsb_polymer_entity_container_identifiers.reference_sequence_identifiers.database_accession",
          "operator": "exact_match",
          "value": "Q5VSL9"
        }
      },
      {
        "type": "terminal",
        "service": "text",
        "parameters": {
          "attribute": "rcsb_polymer_entity_container_identifiers.reference_sequence_identifiers.database_name",
          "operator": "exact_match",
          "value": "UniProt"
        }
      }
    ]
  },
  "return_type": "entry",
  "request_options": {
    "results_content_type": [
      "computational",
      "experimental"
    ]
  }
}
open in editortry it out

Structure Search with Custom Data

This example showcases how to search with structures not deposited in the PDB archive by pointing to external URLs such as predictions from AlphaFold DB, ModelArchive, or SWISS-MODEL. Any publicly available URL can be referenced. This feature can be used for structure (3D-shape) and strucmotif (structure motif) searches. Required inputs are the file location (url) and format ('cif' or 'bcif' for BinaryCIF). Gzipped content is supported as well.

{
  "query": {
    "type": "terminal",
    "service": "structure",
    "parameters": {
      "value": {
        "url": "https://alphafold.ebi.ac.uk/files/AF-Q8VCK6-F1-model_v4.cif",
        "format": "cif"
      },
      "operator": "relaxed_shape_match"
    }
  },
  "return_type": "assembly"
}
open in editortry it out

Migration Guides

Migrating from Legacy Search API

Applications written on top of the Legacy Search APIs no longer work because these services have been discontinued. This migration guide describes the necessary steps to convert applications from using Legacy Search API Web Service to a new RCSB Search API.

Migrating from v1 to v2

The following guide will help you migrate from API v1 to v2. This page contains information you need to know when migrating from deprecated API version v1 to a newer version v2.

Acknowledgements

To cite this service, please reference:

Related publications:

Contact Us

Contact info@rcsb.org with questions or feedback about this service.

shell