Documentation
Illustrated user guide
Download as PDF |
---|
guide.pdf |
Data sources
Biological categories
Database | Version | Retrieval data |
---|---|---|
Gene Ontology | - | June 2019 |
KEGG | - | June 2019 |
miRBase | 22 | June 2019 |
miRCarta | 1.1 | June 2019 |
Reactome | - | June 2019 |
WikiPathways | - | June 2019 |
miRNA targets
Database | Version | Retrieval data |
---|---|---|
MiRanda | 3.3a | June 2019 |
miRTarBase | 7 | June 2019 |
TargetScan | 7.1 | June 2019 |
Statistical analysis
All compute intensive tasks have been performed using the GeneTrail2 C++ library [1] and GNU Parallel [2]. Results of the enrichment analysis were evaluated using the freely available statistical programming environment R, version 3.5.
Parameter overview
Statistical test | Over-representation analysis |
P-value adjustment | Benjamini-Hochberg |
$\alpha$-level | 0.05 |
Minimal category size | 2 |
Maximal category size | 1000 |
Over-representation analysis
In order to judge if a certain biological category is significantly enriched for a certain miRNA, we use a test called over-representation analysis (ORA). This approach has been employed by many authors, e.g. [3], [4], [5], [6], [7]. Here we use the version of ORA that was presented by Backes et al. [3]. This approach is based on the hypergeometric distribution and can be used to test if a set of selected biological entities is significantly more or less present in a biological category than expected by chance.
We use ORA to judge if a biological pathway contains more targets of a certain miRNA than expected by chance. In order to calculate this chance, ORA relies on a reference set R (background). In our case this is a list of all miRNA targets for the corresponding confidence.
Assume a biological category C has k entries in list $T = (t_{1},t_{2},\ldots,t_{n})$ and l entries in reference set $R=(r_{1},r_{2},\ldots,r_{m})$. Based on this information we expect to find $k'=\frac{n*l}{m}$ elements of test set T in category C on average.
If T is a subset of R, the hypergeometric test is applied to compute a p-value for C:
$$P_C(k)=\sum\limits_{i=k}^{n} \frac{\binom{l}{i}\binom{m-l}{n-i}}{\binom{m}{n}}$$Benjamini Hochberg adjustment
The Benjamini-Hochberg method [8], [9] is a step-up approach to control the false discovery rate. It assumes all p-values to be independent. Given $n$ increasingly sorted p-values $\{p_1,...,p_n\}$, we can can compute the adjusted p-values using the following formula:
$$\tilde p_{i}\ =\ \begin{cases} p_{i} & \text{for } i=n\\ \min \left( \tilde p_{(i-1)}, \frac{n}{i}p_{i} \right) & \text{for }i=n-1 ,...,1 \end{cases}$$Bibliography
- Multi-omics Enrichment Analysis using the GeneTrail2 Web Service Bioinformatics Oxford University Press
- GNU Parallel - The Command-Line Power Tool ;login: The USENIX Magazine (View online)
- GeneTrail—advanced gene set enrichment analysis Nucleic acids research Oxford Univ Press (View online)
- Global functional profiling of gene expression Genomics Elsevier (View online)
- Identifying biological themes within lists of genes with EASE Genome Biol (View online)
- Ontological analysis of gene expression data: current tools, limitations, and open problems Bioinformatics Oxford Univ Press (View online)
- GOTree Machine (GOTM): a web-based platform for interpreting sets of interesting genes using Gene Ontology hierarchies BMC bioinformatics BioMed Central Ltd
- Controlling the false discovery rate: a practical and powerful approach to multiple testing Journal of the Royal Statistical Society. Series B (Methodological) JSTOR
- More powerful procedures for multiple significance testing Statistics in medicine Wiley Online Library