Protmapper API¶
-
class
protmapper.api.
MappedSite
(up_id, valid, orig_res, orig_pos, error_code=None, mapped_id=None, mapped_res=None, mapped_pos=None, description=None, gene_name=None)[source]¶ Bases:
object
Represent details of a site that was mapped.
-
error_code
¶ One of several strings indicating an error in retrieving the protein sequence, or None if there was no error. Error codes include ‘NO_UNIPROT_ID’ if the given gene name could not be converted into a Uniprot ID; ‘UNIPROT_HTTP_NOT_FOUND’ if the given Uniprot ID resulted in a 404 Not Found error from the Uniprot web service; or ‘UNIPROT_HTTP_OTHER’ if it was any other type of Uniprot web service error. Any other unexpected errors in getting the sequence are assigned the ‘UNIPROT_OTHER’ code. If the error code is not None, the orig_res and orig_pos fields will be set (based on the query arguments) but all other fields will be None.
Type: str or None
-
valid
¶ True if the original site was valid with respect to the given protein, False otherwise. Further, in case of an error (if error_code is not None), it is set to None.
Type: bool
-
mapped_id
¶ The Uniprot ID for the protein containing the mapped site. If up_id is the Uniprot ID for the human reference sequence, in most cases this will match; however, exceptions will occur if the site position refers to a site that is unique to a particular isoform.
Type: str
-
description
¶ A description of the mapping that was done, comes from a fixed set of codes of types of mapping that were performed.
Type: str
-
-
class
protmapper.api.
ProtMapper
(site_map=None, use_cache=False, cache_path=None)[source]¶ Bases:
object
Use curated site information to standardize modification sites in stmts.
Parameters: - site_map (dict (as returned by
load_site_map()
)) – A dict mapping tuples of the form (gene, orig_res, orig_pos) to a tuple of the form (correct_res, correct_pos, comment), where gene is the string name of the gene (canonicalized to HGNC); orig_res and orig_pos are the residue and position to be mapped; correct_res and correct_pos are the corrected residue and position, and comment is a string describing the reason for the mapping (species error, isoform error, wrong residue name, etc.). - use_cache (Optional[bool]) – If True, the SITEMAPPER_CACHE_PATH from the config (or environment) is loaded and cached mappings are read and written to the given path. Otherwise, no cache is used. Default: False
Examples
Fixing site errors on both the modification state of an agent (MAP2K1) and the target of a Phosphorylation statement (MAPK1):
>>> map2k1_phos = Agent('MAP2K1', db_refs={'UP':'Q02750'}, mods=[ ... ModCondition('phosphorylation', 'S', '217'), ... ModCondition('phosphorylation', 'S', '221')]) >>> mapk1 = Agent('MAPK1', db_refs={'UP':'P28482'}) >>> stmt = Phosphorylation(map2k1_phos, mapk1, 'T','183') >>> (valid, mapped) = default_mapper.map_sites([stmt]) >>> valid [] >>> mapped # doctest:+IGNORE_UNICODE [ MappedStatement: original_stmt: Phosphorylation(MAP2K1(mods: (phosphorylation, S, 217), (phosphorylation, S, 221)), MAPK1(), T, 183) mapped_mods: (('MAP2K1', 'S', '217'), ('S', '218', 'off by one')) (('MAP2K1', 'S', '221'), ('S', '222', 'off by one')) (('MAPK1', 'T', '183'), ('T', '185', 'off by two; mouse sequence')) mapped_stmt: Phosphorylation(MAP2K1(mods: (phosphorylation, S, 218), (phosphorylation, S, 222)), MAPK1(), T, 185) ] >>> ms = mapped[0] >>> ms.original_stmt Phosphorylation(MAP2K1(mods: (phosphorylation, S, 217), (phosphorylation, S, 221)), MAPK1(), T, 183) >>> ms.mapped_mods # doctest:+IGNORE_UNICODE [(('MAP2K1', 'S', '217'), ('S', '218', 'off by one')), (('MAP2K1', 'S', '221'), ('S', '222', 'off by one')), (('MAPK1', 'T', '183'), ('T', '185', 'off by two; mouse sequence'))] >>> ms.mapped_stmt Phosphorylation(MAP2K1(mods: (phosphorylation, S, 218), (phosphorylation, S, 222)), MAPK1(), T, 185)
-
get_psp_mapping
(orig_id, query_id, gene_name, res, pos, query_pos, mapping_code)[source]¶ Wrapper around Phosphosite queries that performs peptide remapping.
The function is called with a uniprot ID, residue, and position combination that is used to query the phosphosite_client for a valid corresponding site on the human reference protein. The mapping_code is provided by the caller to indicate the type of mapping being attempted (e.g., human isoform, mouse, rat, methionine). If a valid mapping is obtained, this is the error code that is applied. If a valid mapping is obtained but it is for a human isoform, this indicates that the queried site exists only on a human isoform and not on the human reference protein, and the code ISOFORM_SPECIFIC_SITE is used. If the site returned by the phosphosite_client is at a position that does not match the Uniprot reference sequence (which can happen when the queried site and the PhosphositePlus protein sequences both exclude the initial methionine), the site is remapped to the Uniprot reference sequence using the peptide information for the site in PhosphositePlus. In these cases, the mapping code REMAPPED_FROM_PSP_SEQUENCE is used.
Parameters: - orig_id (str) – Original Uniprot ID of the protein to be mapped.
- query_id (str) – Uniprot ID of the protein being queried for sites. This may differ from orig_id if the orthologous mouse or rat protein is being checked for sites.
- gene_name (str) – Gene name of the protein.
- res (str) – Residue of the site to be mapped.
- pos (str) – Position of the site to be mapped.
- query_pos (str) – Position being queried for a mapping. This differs from pos when off-by-one (methionine) errors are being checked.
- mapping_code (str) – Mapping code to apply in case of a successful mapping, e.g. INFERRED_ALTERNATIVE_ISOFORM, INFERRED_MOUSE_SITE, etc.
Returns: MappedSite object containing the mapping, or None indicating that no mapping was found.
Return type: MappedSite or None
-
static
map_peptide_to_human_ref
(prot_id, prot_ns, peptide, site_pos)[source]¶ Return a mapped site for a given peptide.
Parameters: Returns: The MappedSite object gives information on results of mapping the site. See
protmapper.api.MappedSite
documentation for details.Return type:
-
map_sitelist_to_human_ref
(site_list, **kwargs)[source]¶ Return a list of mapped sites for a list of input sites.
Parameters: site_list (list of tuple) – Each tuple in the list consists of the following entries: (prot_id, prot_ns, residue, position). Returns: A list of MappedSite objects, one corresponding to each site in the input list. Return type: list of protmapper.api.MappedSite
-
map_to_human_ref
(prot_id, prot_ns, residue, position, do_methionine_offset=True, do_orthology_mapping=True, do_isoform_mapping=True)[source]¶ Check an agent for invalid sites and look for mappings.
Look up each modification site on the agent in Uniprot and then the site map.
Parameters: - prot_id (str) – A Uniprot ID or HGNC gene symbol for the protein.
- prot_ns (str) – One of ‘uniprot’ or ‘hgnc’ indicating the type of ID given.
- residue (str) – Residue to map on the protein to check for validity and map.
- position (str) – Position of the residue to check for validity and map.
- do_methionine_offset (boolean) – Whether to check for off-by-one errors in site position (possibly) attributable to site numbering from mature proteins after cleavage of the initial methionine. If True, checks the reference sequence for a known modification at 1 site position greater than the given one; if there exists such a site, creates the mapping. Default is True.
- do_orthology_mapping (boolean) – Whether to check sequence positions for known modification sites in mouse or rat sequences (based on PhosphoSitePlus data). If a mouse/rat site is found that is linked to a site in the human reference sequence, a mapping is created. Default is True.
- do_isoform_mapping (boolean) – Whether to check sequence positions for known modifications in other human isoforms of the protein (based on PhosphoSitePlus data). If a site is found that is linked to a site in the human reference sequence, a mapping is created. Default is True.
Returns: The MappedSite object gives information on results of mapping the site. See
protmapper.api.MappedSite
documentation for details.Return type:
- site_map (dict (as returned by
-
protmapper.api.
default_mapper
= <protmapper.api.ProtMapper object>¶ A default instance of
ProtMapper
that contains the site information found in resources/curated_site_map.csv’.
-
protmapper.api.
load_site_map
(path)[source]¶ Load the modification site map from a file.
The site map file should be a comma-separated file with six columns:
UniprotId: Uniprot ID of protein Gene: Gene name OrigRes: Original (incorrect) residue OrigPos: Original (incorrect) residue position CorrectRes: The correct residue for the modification CorrectPos: The correct residue position Comment: Description of the reason for the error.
Parameters: path (string) – Path to the tab-separated site map file. Returns: A dict mapping tuples of the form (uniprot_id, orig_res, orig_pos) to a tuple of the form (correct_res, correct_pos, comment), where uniprot_id is the Uniprot ID of the protein; orig_res and orig_pos are the residue and position to be mapped; correct_res and correct_pos are the corrected residue and position, and comment is a string describing the reason for the mapping (species error, isoform error, wrong residue name, etc.). Return type: dict