cellmaps_ppidownloader package
Submodules
cellmaps_ppidownloader.cellmaps_ppidownloadercmd module
- cellmaps_ppidownloader.cellmaps_ppidownloadercmd.main(args)[source]
Main entry point for program
- Parameters:
args (list) – arguments passed to command line usually
sys.argv[1:]()- Returns:
return value of
cellmaps_ppidownloader.runner.CellmapsPPIDownloader.run()or2if an exception is raised- Return type:
cellmaps_ppidownloader.exceptions module
cellmaps_ppidownloader.gene module
- class cellmaps_ppidownloader.gene.APMSGeneNodeAttributeGenerator(apms_edgelist=None, apms_baitlist=None, genequery=<cellmaps_ppidownloader.gene.GeneQuery object>)[source]
Bases:
GeneNodeAttributeGeneratorCreates APMS Gene Node Attributes table
Constructor
- Parameters:
- BAITLIST_GENE_ID = 'GeneID'
- BAITLIST_GENE_SYMBOL = 'GeneSymbol'
- BAITLIST_NUM_INTERACTORS = '# Interactors'
- GENEID_COL1 = 'GeneID1'
- GENEID_COL2 = 'GeneID2'
- SYMBOL_COL1 = 'Symbol1'
- SYMBOL_COL2 = 'Symbol2'
- static get_apms_baitlist_from_tsvfile(tsvfile=None, symbol_col='GeneSymbol', geneid_col='GeneID', numinteractors_col='# Interactors')[source]
Generates list of dicts by parsing TSV file specified by tsvfile with the format header column and corresponding values:
GeneSymbol GeneID # Interactors
- static get_apms_edgelist_from_tsvfile(tsvfile=None, geneid_one_col='GeneID1', symbol_one_col='Symbol1', geneid_two_col='GeneID2', symbol_two_col='Symbol2')[source]
Generates list of dicts by parsing TSV file specified by tsvfile with the format header column and corresponding values:
GeneID1 Symbol1 GeneID2 Symbol2
- get_gene_node_attributes()[source]
Gene gene node attributes which is output as a list of dicts in this format:
{ 'GENEID': { 'name': 'GENESYMBOL', 'represents': 'ensemble:ENSEMBLID1;ENSEMBLID2..', 'ambiguous': 'ALTERNATE GENEs' } }
- Returns:
(list of dicts containing gene node attributes, list of str describing any errors encountered)
- Return type:
- class cellmaps_ppidownloader.gene.CM4AIGeneNodeAttributeGenerator(apms_edgelist=None, genequery=<cellmaps_ppidownloader.gene.GeneQuery object>)[source]
Bases:
GeneNodeAttributeGeneratorCreates APMS Gene Node Attributes table from CM4AI data
Constructor
- Parameters:
apms_edgelist (list) –
list of dict elements where each dict is of format:
{'Bait': VAL, 'Prey': VAL, 'logOddsScore': VAL, 'FoldChange.x': VAL, 'BFDR.x': VAL}
genequery
- static get_apms_edgelist_from_tsvfile(tsvfile=None, bait_col='Bait', prey_col='Prey', bfdr_col=None, foldchange_col=None, foldchange_cutoff=0.0, bfdr_maxcutoff=0.05)[source]
Generates list of dicts by parsing TSV file specified by tsvfile with the format header column and corresponding values:
Bait Prey BFDR.x FoldChange.x
Note
If BFDR.x column does not exist, no BFDR filtering will occur Same goes if FoldChange.x column does not exist
- Parameters:
tsvfile (str) – Path to TSV file with above format
bait_col (str) – Name of bait column
prey_col (str) – Name of prey column
bfdr_col (str) – Name of BFDR aka false discovery rate column If
Noneno BFDR filtering will occurfoldchange_col (str) – Name of FoldChange column If
Noneno FoldChange filtering will occurfoldchange_cutoff (float) – Foldchange cutoff. Only keep rows with values greater then this value. If this value is
Noneno filtering will occurbfdr_maxcutoff (float) – BFDR cutoff. Only keep rows with BFDR less then or equal to this value. If this value is
Noneno filtering will occur
- Returns:
list of dicts, with each dict of format:
{'Bait': VAL, 'Prey': VAL}
- Return type:
- get_gene_node_attributes()[source]
Gene gene node attributes which is output as a list of dicts in this format:
{ 'GENEID': { 'name': 'GENESYMBOL', 'represents': 'ensemble:ENSEMBLID1;ENSEMBLID2..', 'ambiguous': 'ALTERNATE GENEs', 'bait': True or False} }
- Returns:
(list of dicts containing gene node attributes, list of str describing any errors encountered)
- Return type:
- class cellmaps_ppidownloader.gene.GeneNodeAttributeGenerator[source]
Bases:
objectBase class for GeneNodeAttribute Generator
Constructor
- static add_geneids_to_set(gene_set=None, ambiguous_gene_dict=None, geneid=None)[source]
Examines geneid passed in and if a comma exists in value split by comma and assume multiple genes. Adds those genes into gene_set and add entry to ambiguous_gene_dict with key set to each gene name and value set to original geneid value
- get_gene_node_attributes()[source]
Should be implemented by subclasses
- Raises:
NotImplementedError – Always
- class cellmaps_ppidownloader.gene.GeneQuery(mygeneinfo=<mygene.MyGeneInfo object>)[source]
Bases:
objectGets information about genes from mygene
Constructor
- get_symbols_for_genes(genelist=None, scopes='_id')[source]
Queries for genes via GeneQuery() object passed in via constructor
- Parameters:
- Returns:
result from mygene which is a list of dict objects where each dict is of format:
{ 'query': 'ID', '_id': 'ID', '_score': #.##, 'ensembl': { 'gene': 'ENSEMBLEID' }, 'symbol': 'GENESYMBOL' }
- Return type:
cellmaps_ppidownloader.runner module
- class cellmaps_ppidownloader.runner.CellmapsPPIDownloader(outdir=None, imgsuffix='.jpg', apmsgen=None, skip_logging=True, provenance=None, input_data_dict=None, provenance_utils=<cellmaps_utils.provenance.ProvenanceUtil object>, skip_failed=False)[source]
Bases:
objectDownloads AP-MS protein-protein interaction data, and registers datasets for provenance tracking in FAIRSCAPE.
Constructor
- Parameters:
outdir (str) – directory where images will be downloaded to
apmsgen (
APMSGeneNodeAttributeGenerator) – gene node attribute generator for APMS dataskip_logging (bool) – If
Trueskip logging, ifNoneorFalsedo NOT skip loggingprovenance (dict) –
Provenance information about input files as dictionary.
Example:
{ 'name': 'Example input dataset', 'organization-name': 'CM4AI', 'project-name': 'Example', 'edgelist': { 'name': 'sample edgelist', 'author': 'Krogan Lab', 'version': '1.0', 'date-published': '07-31-2023', 'description': 'AP-MS Protein interactions on HSC2 cell line, example dataset', 'data-format': 'tsv' }, 'baitlist': { 'name': 'sample baitlist', 'author': 'Krogan Lab', 'version': '1.0', 'date-published': '07-31-2023', 'description': 'AP-MS Baits used for Protein interactions on HSC2 cell line', 'data-format': 'tsv' } }
input_data_dict (dict) –
All attributes and their corresponding values of the input data e.g.
{'outdir': 'test', 'baitlist': 'path/to/file/with/baitlist'}
imgsuffix (str) –
Unused parameter.
Deprecated since version 0.2.2.
The imgsuffix parameter is deprecated and will be removed in a future release.
- BAITLIST_FILEKEY = 'baitlist'
- CM4AI_ROCRATE = 'cm4ai_rocrate'
- EDGELIST_FILEKEY = 'edgelist'
- static get_example_provenance(requiredonly=True, with_ids=False)[source]
Gets a dict of provenance parameters needed to add/register a dataset with FAIRSCAPE
- get_ppi_gene_node_attributes_file()[source]
Gets full path to ppi gene node attribute file under output directory created when invoking
run()- Returns:
Path to file
- Return type:
- get_ppi_gene_node_errors_file()[source]
Gets full path to ppi gene node attribute errors file under output directory created when invoking
run()- Returns:
Path to file
- Return type:
- run()[source]
Downloads ppi data to output directory specified in constructor
- Raises:
CellMapsPPIDownloaderError – If there is an error
- Returns:
0 upon success, otherwise failure
Module contents
Top-level package for cellmaps_ppidownloader.