In case you don't have access to the experimental data, you may either ask the bioinformatician/data manager at your lab/institute who has analyzed the raw data to run the ExIR model or ask them to send you the normalized experimental data.
Generally speaking, any type of experimental data such as transcriptomic, proteomic, and metabolomic data could be used for running the ExIR model.
The only requirements are:
Literally, there is no limitation to the platform used for data generation and you may use any platform for the generation of your transcriptomic, proteomic, or metabolomic data.
As far as we have tested the ExIR model, there is no limit to the size of the dataset, neither the number of samples nor the number of features ( e.g. genes)
Two file types are acceptible for inputting into the ExIR model as follows:
CSV: A comma-separted value (CSV) file (.csv)
TXT: A tab-delimited file (.txt)
There are three things to consider when preparing an experimental data file as follows:
Samples on the columns: The samples/cells should come on the columns of the dataset
Features on the rows: The features ( i.e. genes, proteins, metabolites, etc.) should come on the rows of the dataset)
Condition of samples: A row should be added to the dataset for defining the condition/state of samples/cells ( e.g. Tumor or Healthy)
You can see below a simple example of a properly structured dataset.
In case you don't have access to the differential data, you may either ask the bioinformatician/data manager at your lab/institute who has analyzed the raw data to run the ExIR model or ask them to send you a table of significant (filtered) differential data.
Two file types are acceptible for inputting into the ExIR model as follows:
CSV: A comma-separted value (CSV) file (.csv)
TXT: A tab-delimited file (.txt)
There are five things to consider when preparing differential data files as follows:
The dataset should be filtered: The differential dataset should include only significantly differentially expressed features ( e.g. adjusted P-value<0.05).
Features on the rows: The features ( i.e. genes, proteins, metabolites, etc.) should come on the rows of the datasets.
Fold changes and significance values on columns: Fold changes and their significance values ( e.g. P-value, adjusted P-value, etc.) should come on the columns of the dataset.
A two column table: The differential dataset should include two columns including differential values (fold changes) and significance values ( e.g. P-value, adjusted P-value, etc.). The important point is that the column of differential values (fold changes) should always come as the first column and the significance values as the second column.
Separate differential files: In case your study includes more than two conditions ( e.g. different doses of drug treatments) or time-points (TPs), you should upload separate differential datasets for each step. For instance, if your study includes three TPs, you should provide a differential dataset for TP2 vs TP1 and another dataset for TP3 vs TP2. Please also note that the significance (adjusted P-value) column is mandatory for differential datasets. Also, note that different differential datasets include only the significantly differentially expressed features (genes) and, consequently, different differential datasets could include both unique and common genes. Regarding the single-cell datasets, if you have several differential datasets belonging to different pairs of single-cell clusters, as in most cases different clusters of a single-cell dataset correspond to different cell-types, not different time-points/dosages, it is more reasonable and recommended to run the ExIR model on each differential dataset separately.
You can see below a simple example of differential datasets of a three TP study.
In case your study includes more than two conditions or time-points (TPs), you may optionally upload a separate dataset for the regression data. If you don't have access to the regression data, you may either ask the bioinformatician/data manager at your lab/institute who has analyzed the raw data to run the ExIR model or ask them to send you a table of regression data, if applicable.
Two file types are acceptible for inputting into the ExIR model as follows:
CSV: A comma-separted value (CSV) file (.csv)
TXT: A tab-delimited file (.txt)
There are three things to consider when preparing a regression dataset as follows:
Features on the rows: The features ( i.e. genes, proteins, metabolites, etc.) should come on the rows of the datasets.
Regression and significance values on columns: Regression values as well as their significance values, if provided, ( e.g. P-value, adjusted P-value, etc.) should come on the columns of the dataset. Please also note that the significance (adjusted P-value) column is not mandatory for regression dataset but if provided, the dataset should include only significant features ( e.g. adjusted P-value<0.05).
A two or single column table: The regression dataset can include either two (R-squared values and Significance values) or one (only R-squared values) column. The important point is that the column of R-squared (regression) values should always come as the first column.
You can see below a simple example of regression (trajectory) data with and without significance values (adjusted P-value).
The desired list of features ( e.g. genes, proteins, metabolites, etc.) is optional and you could just ignore it so that the ExIR model will be run and trained on the entire dataset. However, you could upload a list of desired features to train the model asccording to that list. This is useful when you have a handful of candidate features ( e.g. based on your previous assasys or according to the literature) and you would like to functionally classify them (into drivers, biomarkers, and mediators) and prioritize them for downstream experimental functional validations.
Two file types are acceptible for inputting into the ExIR model as follows:
CSV: A comma-separted value (CSV) file (.csv)
TXT: A tab-delimited file (.txt)
As far as we have tested the ExIR model, there is no limit to the size of the desired list of features.
There are four things to consider when preparing the desired list of features as follows:
A single column file: The desired list of features should be in a single column .txt or .csv file.
No header: The single column of desired features should not include any header
Included in the normalized experimental data: Apparently, all of the desired feature should be included in the normalized experimental data and no external gene without any information is acceptable
Identical feature names: synonym names are not allowed, and you should use exactly the same names of features that you have used in the normalized expermental dataset
You can see below a simple example of a properly structured desired list of features.
The synonyms table is not required for running the ExIR model and will be used for the visualization of the ExIR output. The purpose of the sysnonyms table is simple! Consider a situation when you/your bioinformatician have done data processing, normalization, and differential expression analysis on the dataset in which gene names are in Ensembl gene name format (Ensembl ID) and, consequently, the feature names in all of your files and datasets are in the Ensembl ID format. At the end of the day, any figure you generate based on the results of the ExIR model will show the Ensembl ID of genes. However, you might want to show the symbol of genes (gene names) in the figure instead of Ensembl IDs. So, instead of revising all of the files and datasets before inputting into the ExIR model, you can input the files as they are and, instead, generate a synonyms table for visualization of the results.
There are several web servers, online databases as well as R packages for this purpose. Some of these online tools and R packages are exampled below:
g:Profiler: g:Profiler is a public web server for characterising and manipulating gene lists.
DAVID Gene ID Conversion Tool: DAVID Gene ID Conversion Tool is a gene ID conversion tool on the Database for Annotation, Visualization and Integrated Discovery (DAVID) web server.
bioDBnet: db2db: bioDBnet: db2db allows for conversions of identifiers from one database to other database identifiers or annotations.
R package mygene: R package mygene is an easy-to-use R wrapper to access MyGene.Info_ services. MyGene.Info_ provides simple-to-use REST web services to query/retrieve gene annotation data.
R package gprofiler2: R package gprofiler2 is a toolset for functional enrichment analysis and visualization, gene/protein/SNP identifier conversion and mapping orthologous genes across species.
Two file types are acceptible for inputting into the visualization tool as follows:
CSV: A comma-separted value (CSV) file (.csv)
TXT: A tab-delimited file (.txt)
The number of rows should be equal to the number of features on the columns of the normalized experimental data.
There are four things to consider when preparing the synonyms table as follows:
Features on the rows: The features ( i.e. genes, proteins, metabolites, etc.) should come on the rows of the table
A two column file: The synonyms table should be in a two column .txt or .csv file.
First column as the original names: The first column of the table should include the original names of features (the same ones used in the normalized experimental data).
Second column as the synonym names: The second column of the table should include the the synonyms of the original names.
You can see below a simple example of a properly structured synonyms table.
The computational manipulation of cells works based on the SIRIR (SIR-based Influence Ranking) model and could be applied on the output of the ExIR model. For feature (gene/protein/etc.) knockout the SIRIR model is used to remove the feature from the network and assess its impact on the flow of information (signaling) within the network. On the other hand, in case of up-regulation a node similar to the desired node is added to the network with exactly the same connections (edges) as of the original node. Next, the SIRIR model is used to evaluate the difference in the flow of information/signaling after adding (up-regulating) the desired feature/node compared with the original network. Accordingly, you may note that as the gene/protein knockout would impact on the integrity of the under-investigation network as well as the networks of other overlapping biological processes/pathways, it is recommended to select those features that simultaneously have the highest (most significant) ExIR-based rank and lowest knockout rank. In contrast, as the up-regulation would not affect the integrity of the network, you may select the features with highest (most significant) ExIR-based and up-regulation-based ranks.
Salavaty A, Ramialison M, Currie PD. ExIR: a versatile one-stop model for the extraction, classification, and prioritization of candidate genes from experimental data
Salavaty A, Ramialison M, Currie PD. Integrated Value of Influence: An Integrative Method for the Identification of the Most Influential Nodes within Networks. Patterns (N Y). 2020 Jun 22;1(5):100052.
Decoding the information buried within the interconnection of components could have several benefits for the smart control of a complex system. One of the major challenges in this regard is the identification of the most influential individuals that have the potential to cause the highest impact on the entire network. This knowledge could provide the ability to increase network efficiency and reduce costs. In this article, we present a novel algorithm termed the Integrated Value of Influence (IVI) that combines the most important topological characteristics of the network to identify the key individuals within it. The IVI is a versatile method that could benefit several fields such as sociology, economics, transportation, biology, and medicine. In biomedical research, for instance, identification of the true influential nodes within a disease-associated network could lead to the discovery of novel biomarkers and/or drug targets, a process that could have a considerable impact on society.
The IVI function as well as the centrality-based visualization function are part of the
R package
influential
Additionally, several other functions have been provided for the calculation of some commonly used centrality measures as well as the extraction,
classification and ranking of top candidate features from experimental data. You may install the R package influential via either
CRAN
or its
GitHub repo
.
CRAN:
install.packages('influential')
GitHub repo:
## install.packages('devtools') devtools::install_github('asalavaty/influential', build_vignettes = TRUE)
The Experimental data-based Integrative Ranking (ExIR) is a sophisticated model for classification and ranking of the top candidate features ( e.g. genes, proteins, and metabolites) based on only the experimental data. You could use any type of experimental data such as transcriptomics, proteomics, etc. to run the ExIR model. Also, you can visualize and computationally validate (knockout/over-express) top candidates proposed by the ExIR.
Change Log - version and update history
Latest update: Aug 20, 2023.
Version 1.2
Update: Aug 20, 2023
Version 1.1
Update: July 19, 2021.
Version 1.0
Update: April 29, 2021.
The ExIR project was done by
Adrian (Abbas) Salavaty
and was supervised by
Prof. Peter Currie
and
Assoc. Prof. Mirana Ramialison
. The ExIR shiny app was designed and developed by Adrian according to the ExIR function of the
R package influential
. Also, the visualization of the ExIR results has been rooted from another function of the
influential R package
. You may have a look at the
CITATION
tab to get more information regarding the influential R package and how to install it.
To get more information about the Influential Software Package team refer to the About page of the Influential Software Package portal .
Also, there is a Youtube channel dedicated to tutorial videos of different functions of the influential R package.
We would like to thanks Lan Nguyen, PhD, for his constructive feedback on the ExIR manuscript.
A part of the results of the ExIR project is based on data generated by the TCGA Research Network.
The ExIR project was supported by Monash University and The Australian Regenerative Medicine Institute (itself supported by grants from the State Government of Victoria and the Australian Government).
We appreciate your interest in ExIR. Use the form below to drop us an email.