Protmapper API

exception protmapper.api.InvalidSiteException[source]

Bases: Exception

class protmapper.api.MappedSite(up_id, valid, orig_res, orig_pos, error_code=None, mapped_id=None, mapped_res=None, mapped_pos=None, description=None, gene_name=None)[source]

Bases: object

Represent details of a site that was mapped.

up_id

The UniProt ID of the protein whose site was mapped.

Type:str
error_code

One of several strings indicating an error in retrieving the protein sequence, or None if there was no error. Error codes include ‘NO_UNIPROT_ID’ if the given gene name could not be converted into a Uniprot ID; ‘UNIPROT_HTTP_NOT_FOUND’ if the given Uniprot ID resulted in a 404 Not Found error from the Uniprot web service; or ‘UNIPROT_HTTP_OTHER’ if it was any other type of Uniprot web service error. Any other unexpected errors in getting the sequence are assigned the ‘UNIPROT_OTHER’ code. If the error code is not None, the orig_res and orig_pos fields will be set (based on the query arguments) but all other fields will be None.

Type:str or None
valid

True if the original site was valid with respect to the given protein, False otherwise. Further, in case of an error (if error_code is not None), it is set to None.

Type:bool
orig_res

The original amino acid residue that was mapped.

Type:str
orig_pos

The original amino acid position that was mapped.

Type:str
mapped_id

The Uniprot ID for the protein containing the mapped site. If up_id is the Uniprot ID for the human reference sequence, in most cases this will match; however, exceptions will occur if the site position refers to a site that is unique to a particular isoform.

Type:str
mapped_res

The mapped amino acid residue.

Type:str
mapped_pos

The mapped amino acid position.

Type:str
description

A description of the mapping that was done, comes from a fixed set of codes of types of mapping that were performed.

Type:str
gene_name

The standard (HGNC) gene name of the protein that was mapped.

Type:str
has_mapping()[source]

Return True if the original site was mapped successfully.

Returns:True if a mapping was successfully obtained for the site, False otherwise.
Return type:bool
not_invalid()[source]

Return True if the original site is not known to be invalid.

Returns:True if the original site is valid or if there is an error code, which implicitly means that the validity of the original site could not be established. False otherwise.
Return type:bool
class protmapper.api.ProtMapper(site_map=None, use_cache=False, cache_path=None)[source]

Bases: object

Use curated site information to standardize modification sites in stmts.

Parameters:
  • site_map (dict (as returned by load_site_map())) – A dict mapping tuples of the form (gene, orig_res, orig_pos) to a tuple of the form (correct_res, correct_pos, comment), where gene is the string name of the gene (canonicalized to HGNC); orig_res and orig_pos are the residue and position to be mapped; correct_res and correct_pos are the corrected residue and position, and comment is a string describing the reason for the mapping (species error, isoform error, wrong residue name, etc.).
  • use_cache (Optional[bool]) – If True, the SITEMAPPER_CACHE_PATH from the config (or environment) is loaded and cached mappings are read and written to the given path. Otherwise, no cache is used. Default: False

Examples

Fixing site errors on both the modification state of an agent (MAP2K1) and the target of a Phosphorylation statement (MAPK1):

>>> map2k1_phos = Agent('MAP2K1', db_refs={'UP':'Q02750'}, mods=[
... ModCondition('phosphorylation', 'S', '217'),
... ModCondition('phosphorylation', 'S', '221')])
>>> mapk1 = Agent('MAPK1', db_refs={'UP':'P28482'})
>>> stmt = Phosphorylation(map2k1_phos, mapk1, 'T','183')
>>> (valid, mapped) = default_mapper.map_sites([stmt])
>>> valid
[]
>>> mapped  # doctest:+IGNORE_UNICODE
[
MappedStatement:
    original_stmt: Phosphorylation(MAP2K1(mods: (phosphorylation, S, 217), (phosphorylation, S, 221)), MAPK1(), T, 183)
    mapped_mods: (('MAP2K1', 'S', '217'), ('S', '218', 'off by one'))
                 (('MAP2K1', 'S', '221'), ('S', '222', 'off by one'))
                 (('MAPK1', 'T', '183'), ('T', '185', 'off by two; mouse sequence'))
    mapped_stmt: Phosphorylation(MAP2K1(mods: (phosphorylation, S, 218), (phosphorylation, S, 222)), MAPK1(), T, 185)
]
>>> ms = mapped[0]
>>> ms.original_stmt
Phosphorylation(MAP2K1(mods: (phosphorylation, S, 217), (phosphorylation, S, 221)), MAPK1(), T, 183)
>>> ms.mapped_mods # doctest:+IGNORE_UNICODE
[(('MAP2K1', 'S', '217'), ('S', '218', 'off by one')), (('MAP2K1', 'S', '221'), ('S', '222', 'off by one')), (('MAPK1', 'T', '183'), ('T', '185', 'off by two; mouse sequence'))]
>>> ms.mapped_stmt
Phosphorylation(MAP2K1(mods: (phosphorylation, S, 218), (phosphorylation, S, 222)), MAPK1(), T, 185)
get_psp_mapping(orig_id, query_id, gene_name, res, pos, query_pos, mapping_code)[source]

Wrapper around Phosphosite queries that performs peptide remapping.

The function is called with a uniprot ID, residue, and position combination that is used to query the phosphosite_client for a valid corresponding site on the human reference protein. The mapping_code is provided by the caller to indicate the type of mapping being attempted (e.g., human isoform, mouse, rat, methionine). If a valid mapping is obtained, this is the error code that is applied. If a valid mapping is obtained but it is for a human isoform, this indicates that the queried site exists only on a human isoform and not on the human reference protein, and the code ISOFORM_SPECIFIC_SITE is used. If the site returned by the phosphosite_client is at a position that does not match the Uniprot reference sequence (which can happen when the queried site and the PhosphositePlus protein sequences both exclude the initial methionine), the site is remapped to the Uniprot reference sequence using the peptide information for the site in PhosphositePlus. In these cases, the mapping code REMAPPED_FROM_PSP_SEQUENCE is used.

Parameters:
  • orig_id (str) – Original Uniprot ID of the protein to be mapped.
  • query_id (str) – Uniprot ID of the protein being queried for sites. This may differ from orig_id if the orthologous mouse or rat protein is being checked for sites.
  • gene_name (str) – Gene name of the protein.
  • res (str) – Residue of the site to be mapped.
  • pos (str) – Position of the site to be mapped.
  • query_pos (str) – Position being queried for a mapping. This differs from pos when off-by-one (methionine) errors are being checked.
  • mapping_code (str) – Mapping code to apply in case of a successful mapping, e.g. INFERRED_ALTERNATIVE_ISOFORM, INFERRED_MOUSE_SITE, etc.
Returns:

MappedSite object containing the mapping, or None indicating that no mapping was found.

Return type:

MappedSite or None

static map_peptide_to_human_ref(prot_id, prot_ns, peptide, site_pos)[source]

Return a mapped site for a given peptide.

Parameters:
  • prot_id (str) – A Uniprot ID or HGNC gene symbol for the protein.
  • prot_ns (str) – One of ‘uniprot’ or ‘hgnc’ indicating the type of ID given.
  • peptide (str) – A string of amino acid symbols representing a peptide.
  • site_pos (int) – A site position within the peptide. Note: site_pos is 1-indexed.
Returns:

The MappedSite object gives information on results of mapping the site. See protmapper.api.MappedSite documentation for details.

Return type:

MappedSite

map_sitelist_to_human_ref(site_list, **kwargs)[source]

Return a list of mapped sites for a list of input sites.

Parameters:site_list (list of tuple) – Each tuple in the list consists of the following entries: (prot_id, prot_ns, residue, position).
Returns:A list of MappedSite objects, one corresponding to each site in the input list.
Return type:list of protmapper.api.MappedSite
map_to_human_ref(prot_id, prot_ns, residue, position, do_methionine_offset=True, do_orthology_mapping=True, do_isoform_mapping=True)[source]

Check an agent for invalid sites and look for mappings.

Look up each modification site on the agent in Uniprot and then the site map.

Parameters:
  • prot_id (str) – A Uniprot ID or HGNC gene symbol for the protein.
  • prot_ns (str) – One of ‘uniprot’ or ‘hgnc’ indicating the type of ID given.
  • residue (str) – Residue to map on the protein to check for validity and map.
  • position (str) – Position of the residue to check for validity and map.
  • do_methionine_offset (boolean) – Whether to check for off-by-one errors in site position (possibly) attributable to site numbering from mature proteins after cleavage of the initial methionine. If True, checks the reference sequence for a known modification at 1 site position greater than the given one; if there exists such a site, creates the mapping. Default is True.
  • do_orthology_mapping (boolean) – Whether to check sequence positions for known modification sites in mouse or rat sequences (based on PhosphoSitePlus data). If a mouse/rat site is found that is linked to a site in the human reference sequence, a mapping is created. Default is True.
  • do_isoform_mapping (boolean) – Whether to check sequence positions for known modifications in other human isoforms of the protein (based on PhosphoSitePlus data). If a site is found that is linked to a site in the human reference sequence, a mapping is created. Default is True.
Returns:

The MappedSite object gives information on results of mapping the site. See protmapper.api.MappedSite documentation for details.

Return type:

MappedSite

protmapper.api.default_mapper = <protmapper.api.ProtMapper object>

A default instance of ProtMapper that contains the site information found in resources/curated_site_map.csv’.

protmapper.api.load_site_map(path)[source]

Load the modification site map from a file.

The site map file should be a comma-separated file with six columns:

UniprotId: Uniprot ID of protein
Gene: Gene name
OrigRes: Original (incorrect) residue
OrigPos: Original (incorrect) residue position
CorrectRes: The correct residue for the modification
CorrectPos: The correct residue position
Comment: Description of the reason for the error.
Parameters:path (string) – Path to the tab-separated site map file.
Returns:A dict mapping tuples of the form (uniprot_id, orig_res, orig_pos) to a tuple of the form (correct_res, correct_pos, comment), where uniprot_id is the Uniprot ID of the protein; orig_res and orig_pos are the residue and position to be mapped; correct_res and correct_pos are the corrected residue and position, and comment is a string describing the reason for the mapping (species error, isoform error, wrong residue name, etc.).
Return type:dict