TRIAGEparser¶

Description¶

TRIAGEparser is one of core functions of the TRIAGE R package, designed to evaluate groups of genes, such as the top 100 genes ranked by TRIAGE-weighted values or differentially expressed genes, to classify genes with distinct biological functions. It performs principal component analysis to extract orthogonal patterns of H3K27me3 depositions from consortium-level epigenomic data and uses Bayesian information criterion to optimally determine gene clusters. TRIAGEparser then assesses each gene cluster by searching the protein-protein interaction (PPI) networks from the STRING database and conducts Gene Ontology (GO) enrichment analysis for genes with direct PPI interactions. For more details, see: Zhao et al., Briefings in Bioinformatics 2025, TRIAGE: an R package for regulatory gene analysis and Sun et al., Nucleic Acid Research 2023, Inferring cell diversity in single cell data using consortium-scale epigenetic data as a biological anchor for cell identity.

Note: TRIAGEparser is adaptable to any type of data mapped to protein-coding and non-coding genes, including RNAseq, proteomics, ChIP-seq, and more.

Input and Output¶

Input: TRIAGEparser requires an input file, which can be provided in two formats:

As a Gene List: A list of genes, typically in a text file - each line contains one gene name. This format is suitable when you want to analyze a specific set of genes.

As a Table: A more comprehensive data table, either in .csv or tab/space-delimited .txt format. This format is ideal for analyzing gene expression data along with other associated data points.

Output: The output from TRIAGEparser are two folders, “gene_clusters” and “go”.

In the “gene_clusters” folder, there are “*_gene_clusters.csv” files listing the probabilities of each gene being assigned to different gene clusters. For analyses involving multiple samples/groups, outputs are stored in distinct files.

In the “go” folder, there are “*_go.txt” files listing significance values (i.e., false discovery rates) for all associated GO terms descriptions across PPI-significant clusters. For analyses involving multiple samples/groups, outputs are stored in distinct files.

Parameters¶

input: The input file, which can be a .csv file or a tab/space-delimited .txt file.

input_type: (Optional) Specifies the input type, either ‘table’ or ‘list’. Default is ‘list’.

outdir: (Optional) The path to the output directory. Default is ‘TRIAGEparser_output’.

H3K27me3_pc: (Optional) The pre-calculated H3K27me3 principal components. Default is ‘pca_roadmap’.

number_of_pca: (Optional) Number of principal components to use. Default is 10.

number_of_gene: (Optional) Number of top genes to use if the input type is a table. Default is 100. If the input type is a list, all genes in the list will be used.

no_iter: (Optional) Number of iterations for determining the best number of clusters using Bayesian Information Criterion (BIC). Default is 100.

EM_tol: (Optional) Convergence threshold for the Expectation-Maximization (EM) iterations in the GaussianMixture function. Default is 1e-3.

EM_max_iter: (Optional) Maximum number of EM iterations for the GaussianMixture function. Default is 100.

go_analysis: (Optional) Option to perform GO enrichment analysis. (1: Yes, 0: No). Default is 1.

verbose: (Optional) Level of verbosity (options: 1 or 0). Default is 1.

max_cluster: (Optional) Maximum number of clusters to consider. Default is 10.

gene_order: (Optional) Direction to sort genes (options: ‘ascending’ or ‘descending’). Default is ‘descending’.

go_threshold: (Optional) Threshold for GO term enrichment (False Discovery Rate). Default is 0.01.

Usage Examples¶

# Example 1: Using a tab-delimited table file "input.txt" as the input
# and "TRIAGEparser_output" as the output directory
TRIAGEparser("input.txt", input_type = "table")

# Example 2: Using "input.txt" - a gene list as the input,
# and specifying the output directory
TRIAGEparser("input.txt", outdir = "path/to/results")

# Example 3: Using a CSV file "input.csv" and specifying the output
# directory. Using top 200 genes for the TRIAGEparser analysis.
TRIAGEparser("input.csv",
        input_type = "table",
        outdir = "path/to/results",
        number_of_gene = 200)

TRIAGEparser¶

Description¶

Input and Output¶

Parameters¶

Usage Examples¶

TRIAGE R Package

Navigation

Related Topics