Experimental data-based integrative ranking (ExIR)

Normalized Experimental Data

In case you don't have access to the experimental data, you may either ask the bioinformatician/data manager at your lab/institute who has analyzed the raw data to run the ExIR model or ask them to send you the normalized experimental data.

1. What types of experimental data could be used for running the ExIR?

Generally speaking, any type of experimental data such as transcriptomic, proteomic, and metabolomic data could be used for running the ExIR model.

The only requirements are:

The inclusion of at least two conditions ( e.g. Treatment vs Control) or time-points in the dataset
Previous normalization and log2 transformation of the experimental data. This should be done according to the exact type of experimental data that you are working on. For example, the normalization of single-cell and bulk RNA-seq data is different. In case your data is not log2 transformed, there is an option in the app to automatically log2 transform the data prior to running the model. However, please note that this would not sobstitute the data normalization. You may ask your bioinformatician to give you access to the normalized data, not the raw data!

2. The experimental data generated by which platforms could be used for running the ExIR?

Literally, there is no limitation to the platform used for data generation and you may use any platform for the generation of your transcriptomic, proteomic, or metabolomic data.

3. Is there any limitation to the size of the dataset?

As far as we have tested the ExIR model, there is no limit to the size of the dataset, neither the number of samples nor the number of features ( e.g. genes)

4. What file types are acceptible for inputting into the ExIR model?

Two file types are acceptible for inputting into the ExIR model as follows:

CSV: A comma-separted value (CSV) file (.csv)
TXT: A tab-delimited file (.txt)

5. What are the required features of the dataset of the experimental data?

There are three things to consider when preparing an experimental data file as follows:

Samples on the columns: The samples/cells should come on the columns of the dataset
Features on the rows: The features ( i.e. genes, proteins, metabolites, etc.) should come on the rows of the dataset)
Condition of samples: A row should be added to the dataset for defining the condition/state of samples/cells ( e.g. Tumor or Healthy)

You can see below a simple example of a properly structured dataset.

Differential Data

In case you don't have access to the differential data, you may either ask the bioinformatician/data manager at your lab/institute who has analyzed the raw data to run the ExIR model or ask them to send you a table of significant (filtered) differential data.

1. What file types are acceptible for inputting into the ExIR model?

Two file types are acceptible for inputting into the ExIR model as follows:

CSV: A comma-separted value (CSV) file (.csv)
TXT: A tab-delimited file (.txt)

2. What are the required features of the differential dataset

There are five things to consider when preparing differential data files as follows:

The dataset should be filtered: The differential dataset should include only significantly differentially expressed features ( e.g. adjusted P-value<0.05).
Features on the rows: The features ( i.e. genes, proteins, metabolites, etc.) should come on the rows of the datasets.
Fold changes and significance values on columns: Fold changes and their significance values ( e.g. P-value, adjusted P-value, etc.) should come on the columns of the dataset.
A two column table: The differential dataset should include two columns including differential values (fold changes) and significance values ( e.g. P-value, adjusted P-value, etc.). The important point is that the column of differential values (fold changes) should always come as the first column and the significance values as the second column.
Separate differential files: In case your study includes more than two conditions ( e.g. different doses of drug treatments) or time-points (TPs), you should upload separate differential datasets for each step. For instance, if your study includes three TPs, you should provide a differential dataset for TP2 vs TP1 and another dataset for TP3 vs TP2. Please also note that the significance (adjusted P-value) column is mandatory for differential datasets. Also, note that different differential datasets include only the significantly differentially expressed features (genes) and, consequently, different differential datasets could include both unique and common genes. Regarding the single-cell datasets, if you have several differential datasets belonging to different pairs of single-cell clusters, as in most cases different clusters of a single-cell dataset correspond to different cell-types, not different time-points/dosages, it is more reasonable and recommended to run the ExIR model on each differential dataset separately.

You can see below a simple example of differential datasets of a three TP study.

TP2 vs. TP1

TP3 vs. TP2

Regression Data (Optional)

In case your study includes more than two conditions or time-points (TPs), you may optionally upload a separate dataset for the regression data. If you don't have access to the regression data, you may either ask the bioinformatician/data manager at your lab/institute who has analyzed the raw data to run the ExIR model or ask them to send you a table of regression data, if applicable.

1. What file types are acceptible for inputting into the ExIR model?

Two file types are acceptible for inputting into the ExIR model as follows:

CSV: A comma-separted value (CSV) file (.csv)
TXT: A tab-delimited file (.txt)

2. What are the required features of the regression dataset

There are three things to consider when preparing a regression dataset as follows:

Features on the rows: The features ( i.e. genes, proteins, metabolites, etc.) should come on the rows of the datasets.
Regression and significance values on columns: Regression values as well as their significance values, if provided, ( e.g. P-value, adjusted P-value, etc.) should come on the columns of the dataset. Please also note that the significance (adjusted P-value) column is not mandatory for regression dataset but if provided, the dataset should include only significant features ( e.g. adjusted P-value<0.05).
A two or single column table: The regression dataset can include either two (R-squared values and Significance values) or one (only R-squared values) column. The important point is that the column of R-squared (regression) values should always come as the first column.

You can see below a simple example of regression (trajectory) data with and without significance values (adjusted P-value).

Regression dataset with significance values

Regression dataset without significance values

Desired List of Features (Optional)

The desired list of features ( e.g. genes, proteins, metabolites, etc.) is optional and you could just ignore it so that the ExIR model will be run and trained on the entire dataset. However, you could upload a list of desired features to train the model asccording to that list. This is useful when you have a handful of candidate features ( e.g. based on your previous assasys or according to the literature) and you would like to functionally classify them (into drivers, biomarkers, and mediators) and prioritize them for downstream experimental functional validations.

1. What file types are acceptible for inputting into the ExIR model?

Two file types are acceptible for inputting into the ExIR model as follows:

CSV: A comma-separted value (CSV) file (.csv)
TXT: A tab-delimited file (.txt)

2. Is there any limitation to the size (number of features) of the desired list of features?

As far as we have tested the ExIR model, there is no limit to the size of the desired list of features.

3. What are the required characteristics of the desired list of features?

There are four things to consider when preparing the desired list of features as follows:

A single column file: The desired list of features should be in a single column .txt or .csv file.
No header: The single column of desired features should not include any header
Included in the normalized experimental data: Apparently, all of the desired feature should be included in the normalized experimental data and no external gene without any information is acceptable
Identical feature names: synonym names are not allowed, and you should use exactly the same names of features that you have used in the normalized expermental dataset

You can see below a simple example of a properly structured desired list of features.

Sample desired list of features

A real example of desired list of features including five genes

Synonyms Table (Optionally required for the visualization of ExIR results)

The synonyms table is not required for running the ExIR model and will be used for the visualization of the ExIR output. The purpose of the sysnonyms table is simple! Consider a situation when you/your bioinformatician have done data processing, normalization, and differential expression analysis on the dataset in which gene names are in Ensembl gene name format (Ensembl ID) and, consequently, the feature names in all of your files and datasets are in the Ensembl ID format. At the end of the day, any figure you generate based on the results of the ExIR model will show the Ensembl ID of genes. However, you might want to show the symbol of genes (gene names) in the figure instead of Ensembl IDs. So, instead of revising all of the files and datasets before inputting into the ExIR model, you can input the files as they are and, instead, generate a synonyms table for visualization of the results.

1. Where can I find the synonyms of my features (genes/proteins)?

There are several web servers, online databases as well as R packages for this purpose. Some of these online tools and R packages are exampled below:

g:Profiler: g:Profiler is a public web server for characterising and manipulating gene lists.
DAVID Gene ID Conversion Tool: DAVID Gene ID Conversion Tool is a gene ID conversion tool on the Database for Annotation, Visualization and Integrated Discovery (DAVID) web server.
bioDBnet: db2db: bioDBnet: db2db allows for conversions of identifiers from one database to other database identifiers or annotations.
R package mygene: R package mygene is an easy-to-use R wrapper to access MyGene.Info_ services. MyGene.Info_ provides simple-to-use REST web services to query/retrieve gene annotation data.
R package gprofiler2: R package gprofiler2 is a toolset for functional enrichment analysis and visualization, gene/protein/SNP identifier conversion and mapping orthologous genes across species.

2. What file types are acceptible for inputting into the visualization tool?

Two file types are acceptible for inputting into the visualization tool as follows:

CSV: A comma-separted value (CSV) file (.csv)
TXT: A tab-delimited file (.txt)

3. What should be the number of rows in the synonyms table?

The number of rows should be equal to the number of features on the columns of the normalized experimental data.

4. What are the required features of the synonyms table?

There are four things to consider when preparing the synonyms table as follows:

Features on the rows: The features ( i.e. genes, proteins, metabolites, etc.) should come on the rows of the table
A two column file: The synonyms table should be in a two column .txt or .csv file.
First column as the original names: The first column of the table should include the original names of features (the same ones used in the normalized experimental data).
Second column as the synonym names: The second column of the table should include the the synonyms of the original names.

You can see below a simple example of a properly structured synonyms table.

Sample synonyms table showing only the first five genes

A real example of a synonyms table showing only the first five genes

ExIR Output Tables

You may save the results dataset (as an RDS file) using the 'SAVE DATASET' button for later use or sharing with a colleague. This would save you from redoing the analysis and would speed up your research!

Save dataset

Computational Manipulation of Cells

The computational manipulation of cells works based on the SIRIR (SIR-based Influence Ranking) model and could be applied on the output of the ExIR model. For feature (gene/protein/etc.) knockout the SIRIR model is used to remove the feature from the network and assess its impact on the flow of information (signaling) within the network. On the other hand, in case of up-regulation a node similar to the desired node is added to the network with exactly the same connections (edges) as of the original node. Next, the SIRIR model is used to evaluate the difference in the flow of information/signaling after adding (up-regulating) the desired feature/node compared with the original network. Accordingly, you may note that as the gene/protein knockout would impact on the integrity of the under-investigation network as well as the networks of other overlapping biological processes/pathways, it is recommended to select those features that simultaneously have the highest (most significant) ExIR-based rank and lowest knockout rank. In contrast, as the up-regulation would not affect the integrity of the network, you may select the features with highest (most significant) ExIR-based and up-regulation-based ranks.

Ranking of the Impact of computational Manipulation of Features on the Network

Please cite the following two papers and put a link to the Influential Software Package web portal if you used this shiny app in your study.

ExIR: a versatile one-stop model for the extraction, classification, and prioritization of candidate genes from experimental data

The ExIR manuscript is still under review.

Salavaty A, Ramialison M, Currie PD. ExIR: a versatile one-stop model for the extraction, classification, and prioritization of candidate genes from experimental data

Integrated Value of Influence: An Integrative Method for the Identification of the Most Influential Nodes within Networks

The IVI is published in Patterns, a gold standard data science journal published by the Cell Press.

Salavaty A, Ramialison M, Currie PD. Integrated Value of Influence: An Integrative Method for the Identification of the Most Influential Nodes within Networks. Patterns (N Y). 2020 Jun 22;1(5):100052.

DOI: 10.1016/j.patter.2020.100052

PMID: 33205118

PMCID: PMC7660386

The Bigger Picture of IVI

Decoding the information buried within the interconnection of components could have several benefits for the smart control of a complex system. One of the major challenges in this regard is the identification of the most influential individuals that have the potential to cause the highest impact on the entire network. This knowledge could provide the ability to increase network efficiency and reduce costs. In this article, we present a novel algorithm termed the Integrated Value of Influence (IVI) that combines the most important topological characteristics of the network to identify the key individuals within it. The IVI is a versatile method that could benefit several fields such as sociology, economics, transportation, biology, and medicine. In biomedical research, for instance, identification of the true influential nodes within a disease-associated network could lead to the discovery of novel biomarkers and/or drug targets, a process that could have a considerable impact on society.

R package influential

The IVI function as well as the centrality-based visualization function are part of the R package influential Additionally, several other functions have been provided for the calculation of some commonly used centrality measures as well as the extraction, classification and ranking of top candidate features from experimental data. You may install the R package influential via either CRAN or its GitHub repo .

CRAN:
```
install.packages('influential')
```

GitHub repo:

## install.packages('devtools')
devtools::install_github('asalavaty/influential',
            build_vignettes = TRUE)

What is ExIR?

The Experimental data-based Integrative Ranking (ExIR) is a sophisticated model for classification and ranking of the top candidate features ( e.g. genes, proteins, and metabolites) based on only the experimental data. You could use any type of experimental data such as transcriptomics, proteomics, etc. to run the ExIR model. Also, you can visualize and computationally validate (knockout/over-express) top candidates proposed by the ExIR.

Updates

Change Log - version and update history

Latest update: Aug 20, 2023.

Version 1.2

The implementations of shiny app is optimized.
ExIR function debugged.
The default correlation coefficient is changed from 0.3 to 0.5 in the ExIR shiny app.

Update: Aug 20, 2023

Version 1.1

ExIR function debugged.
Email sending bugs corrected.

Update: July 19, 2021.

Version 1.0

Initial deployment

Update: April 29, 2021.

Credits

The ExIR project was done by Adrian (Abbas) Salavaty and was supervised by Prof. Peter Currie and Assoc. Prof. Mirana Ramialison . The ExIR shiny app was designed and developed by Adrian according to the ExIR function of the R package influential . Also, the visualization of the ExIR results has been rooted from another function of the influential R package . You may have a look at the CITATION tab to get more information regarding the influential R package and how to install it.

To get more information about the Influential Software Package team refer to the About page of the Influential Software Package portal .

Also, there is a Youtube channel dedicated to tutorial videos of different functions of the influential R package.

Acknowledgments

We would like to thanks Lan Nguyen, PhD, for his constructive feedback on the ExIR manuscript.
A part of the results of the ExIR project is based on data generated by the TCGA Research Network.
The ExIR project was supported by Monash University and The Australian Regenerative Medicine Institute (itself supported by grants from the State Government of Victoria and the Australian Government).

GET IN TOUCH

We appreciate your interest in ExIR. Use the form below to drop us an email.

Name:

Email Address:

Subject:

Email Content:

How is your math?

Normalized Experimental Data

Differential Data

Regression Data

Desired List of Features

Synonyms Table

Normalized Experimental Data

1. What types of experimental data could be used for running the ExIR?

2. The experimental data generated by which platforms could be used for running the ExIR?

3. Is there any limitation to the size of the dataset?

4. What file types are acceptible for inputting into the ExIR model?

5. What are the required features of the dataset of the experimental data?

Differential Data

1. What file types are acceptible for inputting into the ExIR model?

2. What are the required features of the differential dataset

TP2 vs. TP1

TP3 vs. TP2

Regression Data (Optional)

1. What file types are acceptible for inputting into the ExIR model?

2. What are the required features of the regression dataset

Regression dataset with significance values

Regression dataset without significance values

Desired List of Features (Optional)

1. What file types are acceptible for inputting into the ExIR model?

2. Is there any limitation to the size (number of features) of the desired list of features?

3. What are the required characteristics of the desired list of features?

Sample desired list of features

A real example of desired list of features including five genes

Synonyms Table (Optionally required for the visualization of ExIR results)

1. Where can I find the synonyms of my features (genes/proteins)?

2. What file types are acceptible for inputting into the visualization tool?

3. What should be the number of rows in the synonyms table?

4. What are the required features of the synonyms table?

Sample synonyms table showing only the first five genes

A real example of a synonyms table showing only the first five genes

ExIR Output Tables

Drivers

Biomarkers

Non-DE-Mediators

DE-Mediators

You may save the results dataset (as an RDS file) using the 'SAVE DATASET' button for later use or sharing with a colleague. This would save you from redoing the analysis and would speed up your research!

Computational Manipulation of Cells

Ranking of the Impact of computational Manipulation of Features on the Network

Knockout Rankings

Up-regulation Rankings

Combined Rankings

Please cite the following two papers and put a link to the Influential Software Package web portal if you used this shiny app in your study.

ExIR: a versatile one-stop model for the extraction, classification, and prioritization of candidate genes from experimental data

The ExIR manuscript is still under review.

Integrated Value of Influence: An Integrative Method for the Identification of the Most Influential Nodes within Networks

The IVI is published in Patterns, a gold standard data science journal published by the Cell Press.

The Bigger Picture of IVI

R package influential

What is ExIR?

Updates

Credits

Acknowledgments

GET IN TOUCH