Inputs
The tool requires either both an APMS edgelist TSV and a baitlist TSV file, detailing gene interactions and targets respectively, or a single .tsv file from the CM4AI RO-Crate. Below is the list and description of each input accepted by the tool.
Output files and directories
baitlist.tsvThis file contains information about the bait proteins used in the affinity purification-mass spectrometry (AP-MS) process. It should contain following columns:
GeneSymbol - the symbol of the gene encoding the bait protein.
GeneID - the unique identifier for the gene in a specific database.
# Interactors - the number of proteins that have been found to interact with the bait protein. These interactions are identified through the AP-MS process. For instance, in example below 2783 different proteins have been found to interact with the PIK3CA protein.
Example:
GeneSymbol GeneID # Interactors
PIK3CA 101928739 2783
edgelist.tsvAn edge list representation of the protein-protein interactions. Each row in this file represents an interaction between two proteins. It should contain following columns:
GeneID1 - the unique identifier for the gene of the first protein in the interaction pair.
Symbol1 - the gene symbol of the first protein in the interaction pair.
GeneID2 - the unique identifier for the gene of the second protein in the interaction pair.
Symbol2 - the gene symbol of the second protein in the interaction pair.
Example:
GeneID1 Symbol1 GeneID2 Symbol2
101928739 PIK3CA 219541 MED19
101928739 PIK3CA 26030 PLEKHG3
101928739 PIK3CA 129446 XIRP2
101928739 PIK3CA 644815 FAM83G
101928739 PIK3CA 23347 SMCHD1
CM4AI_TABLE_PATH:A .tsv file from CM4AI RO-Crate that should contain at least the following columns: Bait, Prey, logOddsScore, FoldChange.x, and BFDR.x.
Bait: Name of the pulled down protein
Prey: Uniprot ID number of identified proteins by MS in pull down (putative bait interactor).
logOddsScore: Logarithm of the odds ratio between test and control conditions for each prey as a measure of interaction significance. The LogOddsScore is a statistical score that represents the logarithm of the odds ratio for a protein-protein interaction. It’s used to quantify the strength and significance of the association between two proteins in an interaction network. The odds ratio compares the likelihood of the interaction occurring to the likelihood of it not occurring. Taking the logarithm of the odds ratio often helps to transform the score into a more symmetric and interpretable form, making it easier to compare and analyze the interactions. Higher LogOddsScores typically indicate stronger evidence for the interaction.
FoldChange.x: represents the ratio of the abundance of a protein or interaction in one experimental condition (Test) compared to another (control). It helps assess whether the abundance of a protein changes significantly between different conditions.
BFDR.x: Bayesian False Discovery Rate
Example:
Bait Prey PreyGene.x Spec SpecSum AvgSpec NumReplicates.x ctrlCounts AvgP.x MaxP.x TopoAvgP.x TopoMaxP.x SaintScore.x logOddsScore FoldChange.x BFDR.x boosted_by.x
DNMT3A O00422 SAP18_HUMAN 6|7|8|10 31 7.75 4 0|0|0|0|0|0|0|0 1.0 1.0 1.0 1.0 1.0 13.51 77.5 0.0
DNMT3A O00571 DDX3X_HUMAN 3|7|11|9 30 7.5 4 0|1|3|3|0|0|0|0 0.99 1.0 0.99 1.0 0.99 3.63 8.57 0.0
DNMT3A O15027 SC16A_HUMAN 40|38|32|37 147 36.75 4 0|0|0|0|0|0|0|0 1.0 1.0 1.0 1.0 1.0 52.31 367.5 0.0
DNMT3A O15042 SR140_HUMAN 5|3|4|2 14 3.5 4 2|0|2|2|0|0|0|0 0.98 1.0 0.98 1.0 0.98 2.81 4.67 0.0
DNMT3A O15056 SYNJ2_HUMAN 5|6|7|5 23 5.75 4 0|0|0|0|0|0|0|0 1.0 1.0 1.0 1.0 1.0 11.87 57.5 0.0
DNMT3A O43143 DHX15_HUMAN 8|13|9|16 46 11.5 4 0|0|1|0|0|0|0|0 1.0 1.0 1.0 1.0 1.0 16.2 92.0 0.0
provenance.jsonPath to file containing provenance information about input files in JSON format. This is required and not including will output error message with example of file.
Example:
{
"name": "Example input dataset",
"organization-name": "CM4AI",
"project-name": "Example",
"edgelist": {
"name": "Antoine Forget sample edgelist",
"author": "Krogan Lab",
"version": "1.0",
"date-published": "07-31-2023",
"description": "AP-MS Protein interactions on HSC2 cell line, example dataset",
"data-format": "tsv"
},
"baitlist": {
"name": "Antoine Forget sample baitlist",
"author": "Krogan Lab",
"version": "1.0",
"date-published": "07-31-2023",
"description": "AP-MS Baits used for Protein interactions on HSC2 cell line",
"data-format": "tsv"
},
"samples": {
"name": "u2os HPA IF images",
"author": "Author of dataset",
"version": "Version of dataset",
"date-published": "Date dataset was published",
"description": "Description of dataset",
"data-format": "csv"
},
"unique": {
"name": "u2os HPA IF images unique",
"author": "Author of dataset",
"version": "Version of dataset",
"date-published": "Date dataset was published",
"description": "Description of dataset",
"data-format": "csv"
}
}