Five species are currently available for analysis:
The main gene identifier used in PECPI-GO is NCBI Gene ID (formerly Entrez ID). PECPI-GO also accepts yeast systematic name (abbreviation as YORF) and Affymetrix ID (hgu133 chip). Users need to select a gene ID type.
PECPI-GO accept only gene expression dataset in tab delimited file (.txt
file) format. The first row in the dataset should be a column header and PECPI-GO will
skip this row automatically. Please notice that the first row of data is ignored
if the input dataset does not contain a column header. The following rows
is gene with the corresponding expression value(s)
The first column should be the
gene identifier column and the remaining column(s) is the expression ratio correspond to
the gene ID in first column. Here are two examples input data for single dataset and
multiple dataset.
It is recommended to use some tools (such as DESeq2, EdgeR) to normalize RNA read counts dataset before applied the dataset to PECPI-GO. After normalizing, a gene list with log 2 ratio between samples can be directly used in PECPI-GO. If the normalized counts of each sample can be obtained in the normalization process, it can be applied to PPI interaction analysis to compute the correlation between control group and experiment group.
Parameters | Explanation |
---|---|
Random sampling size | The size of sampling for computing P-value of pathways in pathway enrichment analysis and P-value of edges that linked enriched pathways in pathway clustering. Increasing the sampling size will results in a more precise P-value, but it also increases the computation time and the usage of CPU and RAM. By default, the value is set at 100,000. |
Max enriched pathway number | The maximum number of enriched pathways. Enriched pathway is removed iterately by pathway Q-value in descending order if the amount of enriched pathways is over the threshold until the amount of enriched pathways is equal to this value. |
Show reactome pathway | Check the box to display Reactome pathway image in PECPI-GO when title of Reactome pathway is clicked. Otherwise, a webpage is opened and linked to the Reactome website to browse the pathway image and pathway details. Some Reactome pathway images have a large size and consume much of random-access memory (RAM) if display the image in PECPI-GO, thus it is not recommended to check the box without a sufficent RAM. By default the box is unchecked. |
Parameters | Explanation |
---|---|
Q-value threshold | The threshold of Q-value (adjusted P-value) of pathway. Pathways with Q-value greater than this pre-defined threshold will be discarded. By default, the value is set to 1.0. |
Coloring threshold | The theshold for coloring gene by expression of the gene. For example if the this value is set to 1.0, log2 ratio of a gene greater than 1.0 and less than -1.0 is shown in red color and l |
Parameters | Explanation |
---|---|
Max frequency | Frequency of a GO term is defined as the number of gene annotated to a GO term divided by the total number of gene annotated to the entire ontology in the database. A general GO term has a higher frequency while a specific GO term has a lower frequency. GO term with frequency greater than this value (too general) is removed. By default, the value is set to 0.1. |
Min NPL | The minimum normalized position level of a GO term. GO term with NPL less than this threshold (too general) is removed. By default, the value is set to 0.2. |
Max NPL | The maximum normalized position level of a GO term. GO term with NPL greater than this threshold (too specific) is removed. By default, the value is set to 0.9. |
Min SS | The minimum semantic similarity value to link GO terms and form a GO term network. By default, the value is set to 0.4. |
Max annotation term | The maximum amount of GO terms to annotate each pathway cluster. Be default, the value is set to 5. |
Parameters | Explanation |
---|---|
Correlation threshold | Threshold for linking genes (proteins) to form a protein network. By default the value is set to 0.6. |
Max frequency | GO term with frequency greater than this value (too general) is removed. By default, the value is set to 0.1. |
Min NPL | The minimum normalized position level of a GO term. GO term with NPL less than this threshold (too general) is removed. By default, the value is set to 0.2. |
Max NPL | The maximum normalized position level of a GO term. GO term with NPL greater than this threshold (too specific) is removed. By default, the value is set to 0.9. |
Min SS | The minimum semantic similarity value to link GO terms and form a GO term network. By default, the value is set to 0.4. |
Max annotation term | The maximum amount of GO terms to annotate each pathway cluster. Be default, the value is set to 5. |
When pathway enrichment analysis is done, a new tab will be generated to easily visualize and explore these enriched pathways and their genes with expression abundance changes, which will be hierarchically displayed in left panel of the interface. By clicking one of these enriched pathways, its corresponding pathway map can be visualized in the right panel. Clicking on one of gene members in each pathway in the left panel can link to the gene’s position in the pathway map, and double click on gene will link to NCBI to browse the gene information.
The tab 'pathway clustering' display the pathway clustering result both in graphical and tabular format to visualize the information of clustered pathways. A hierarchical tree is displayed in graphical format that each tree node represents a set of pathways and different pathway clusters are highlighted with different node colors (i.e., singleton pathways are colored as black)
For the tabular format, the pathways are grouped according to cluster result, with showing cluster P-value, pathway regulation status, net fold change, P-value, and Q-value are listed
Clicking on the colored node in tree or the cluster index in table will pop out another window for getting details of individual pathway cluster.
The left panel display the interacting network of a pathway cluster, where nodes represent pathways and edges represent the common genes overlapped between the two pathways. Clicking an edge in the network will indicate the common genes in the lower right panel. Double click the gene in the table can link to NCBI to browse the gene information.
The GO term annotation to each pathway cluster is also presented in both tabular format and graphical format. In tabular format, the pathway cluster are listed with the GO term annotation and the correspond regulation status, net fold change, NPL, DC, and frequency. Clicking on the index of cluster in first column will pop out a new window for individual pathway cluster annotation.
In graphical format, PECPI-GO hierarchically cluster GO terms with high similarity from all pathway clusters and visualize in a bar chart. The different GO clusters from hierarchical clustering are highligted with different bar color, and the bar is represented as the net fold change of GO term.
PECPI-GO treat the significant common genes as proteins and perform our novel clustering algorithm to clustering proteins. The presentation of the result is similar to pathway cluster annotation that in tabular format and graphical format.
In the tabular format, click on the index of protein cluster in first column will pop out a
new window for individual protein cluster. The upper left panel is the proteins classified
in the cluster, with their degree and Z-score compute from the interaction network of the
subcluster (upper right panel). The lower panel is the GO terms selected to annotate the
protein cluster.
PECPI-GO integrate GO terms from pathway clusters and protein clusters to provided an overview of functional annotation. The integration can be either intersection or union of GO terms from pathway clusters and protein clusters. The result of the integration is visualized in graphical format and tabular format also.
PECPI-GO provide users to manually select the appropriate GO terms based on their knowledge. In the tab 'manual output', users can change the P-value threshold for selecting pathway clusters and protein clusters. In addition, users can also set a threshold for minimum protein number in protein clusters.