RESEARCH ARTICLE

Gango + BioFunctional: A Computational tool for efficient functional gene analysis

Alejandro Rodriguez-Mena 1,2 alejandrodriguez@ub.edu, ORCID: 0009-0008-5839-0511
Xavier Tarragó-Claramunt 1,2 xavitarragoc@gmail.com, ORCID: 0009-0007-8364-3632
Giulia Castellani 1,3 giuli.castellani@studenti.unicam.it, ORCID: 0009-0005-5022-6918
Javier Méndez-Viera 1,2 jmendez@ub.edu, ORCID: 0000-0003-3723-8787
Antonio Monleón-Getino 1,2* amonleong@ub.edu, ORCID: 0000-0001-8214-3205

1BIOST3, Research Group in Biostatistics, Data Science and Bioinformatics, Barcelona, Spain ROR ID: 021018s57

2Department of Genetics, Microbiology and Statistics, Universitat de Barcelona, Barcelona, Spain ROR ID: 021018s57

3School of Biosciences and Veterinary Medicine, University of Camerino, Camerino (MC), Italy ROR ID: 0005w8d69

Abstract

Functional gene analysis is crucial for understanding gene roles in biological processes. However, analyzing data with multiple experimental groups presents significant challenges due to the complexity of data processing and the limitations of existing tools. GANGO + BioFuncional, an R-based Shiny application designed for end-users, addresses these challenges by providing a streamlined and comprehensive workflow for functional gene analysis. This interactive and freely available tool requires no installation, thus significantly enhancing its accessibility. The application is composed of two primary modules: GANGO, which efficiently processes input data and performs functional annotation to Gene Ontology (GO) terms and KEGG pathways; and BioFuncional, dedicated to in-depth analysis and interpretation. Key advantages include a highly user-friendly interface that eliminates the need for programming expertise, robust multi-group analytical capabilities, comprehensive visualization tools (interactive networks and significance-driven bar plots), and seamless compatibility with AI-driven interpretation tools like CURIE. Hosted on a server, GANGO + BioFuncional enhances the efficiency and accessibility of functional gene analysis, making it a valuable asset for both specialists and AI applications, ultimately facilitating deeper biological insights.

Key words: AI Integration, Computational Tool, Functional Gene Analysis, Gene Ontology, KEGG Pathways, Shiny Application

* Corresponding author: E-mail: amonleong@ub.edu (A.M.G.) ; Ph.: ++34-678329864.

Peer review: Double Blind Refereeing.

Ethics statement: It is declared that scientific and ethical principles were followed during the preparation of this study and all studies utilized were indicated in the bibliography (Ethical reporting: editor@euchembioj.com).

Plagiarism Check: Done (iThenticate). Article has been screened for originality.

Received: 03.06.2025; Accepted: 03.07.2025; Online first: 06.07.2025; Published: 11.07.2025

DOI: 10.62063/ecb-63

Citation: Rodriguez-Mena, A., Tarragó-Claramunt, X., Castellani, G., Méndez-Viera, J., & Monleón-Getino, A. (2025). Gango + BioFunctional: A Computational tool for efficient functional gene analysis. The European chemistry and biotechnology journal, 4, 69-80. https://doi.org/10.62063/ecb-63

The copyrights of the studies published in The European Chemistry and Biotechnology Journal (EUCHEMBIOJ) belong to their authors
This article is distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)(https://creativecommons.org/licenses/by-nc/4.0/).

Introduction

Functional gene analysis (Figure 1, workflow) is a vital process in biology, enabling researchers to elucidate the roles of genes in various biological processes. The typical workflow involves several key steps:List of Genes of Interest: The analysis begins with a set of genes relevant to a particular study, such as differentially expressed genes, mutated genes associated with a disease, or genes in a specific biological pathway.

Figure 1. GANGO + BioFunctional Workflow. This diagram illustrates the complete process from biological sample collection to the functional interpretation of genetic results obtained using the GANGO+ BioFunctional application. The workflow comprises the following key stages: 1) Experimental design and biological sample collection, followed by DNA/RNA extraction. 2) DNA/RNA sequencing. 3) GANGO: Bioinformatics procedures, outputting gene enrichment analysis. 4) BioFunctional: Visualization and quantification of the functional analysis (Gene Ontologies and KEGG). 5) Biomedical interpretation: Facilitating biomedical interpretation of the functional analysis through Artificial Intelligence. (Biomedical icons in this diagram were extracted using https://www.biorender.com/).

Functional gene analysis yields insights into biological mechanisms and aids in generating hypotheses for further research (Thomas, 2000).

From data to understanding: Addressing the challenges of functional gene analysis

While functional gene analysis provides critical insights, analyzing studies with more than two experimental groups significantly increases complexity. Consider a study on a disease and its progression, with the following groups:

  1. Group 1: Healthy individual

  2. Group 2: Patients with early-stage diseGroup 4: Patients with late-stage disease

Identifying genes consistently dysregulated across all stages or specific to a particular stage requires advanced statistical methods. Researchers often rely on multiple software tools and custom scripts (e.g., in R) to manage gene lists, organize data, perform functional analysis, and interpret results. This process is time-consuming, requires programming expertise, and can hinder reproducibility. The lack of user-friendly tools to handle and visualize the complexity of multi-group results therefore necessitates more sophisticated approaches (Gene Ontology Consortium, 2015).

To address these challenges, a specialized computational tool is needed to:

Such a tool would reduce analysis time and effort, improve result accuracy and reliability, and broaden the accessibility of functional gene analysis. GANGO + BioFuncional was developed to provide such a solution.

Benchmarking with existing tools

In response to the need to contextualize the utility of GANGO + BioFuncional, a comparison was conducted with widely used functional enrichment analysis tools such as DAVID, GSEA, and Enrichr. Table 1 provides a benchmark of the main features of these tools compared to GANGO + BioFuncional. Unlike DAVID (Huang et al., 2009) and Enrichr (Chen et al., 2013), GANGO + BioFuncional is specifically designed to simplify the analysis of data with multiple experimental groups, offering an intuitive user interface that minimizes the need for programming knowledge.

Table 1. Benchmark comparison of GANGO + BioFuncional with other widely used functional enrichment analysis tools: DAVID, GSEA, and Enrichr.

Feature / Tool GANGO + BioFuncional DAVID GSEA Enrichr
Multi-group Analysis Yes (Simplified) Limited Yes (Specific) Limited
User-Friendly Interface Yes (Shiny, no code) Moderate (Web) Moderate (Software) High (Web)
AI Integration Yes (CURIE) No No No
Advanced Visualization Interactive Networks, Z-score Bar Plots Basic (Graphs) Advanced (Plots) Basic (Bar charts)
Installation Required No (Server) No (Web) Yes No

It highlights GANGO + BioFuncional’s unique features, particularly its simplified multi-group analysis capabilities and an intuitive user interface designed to minimize the need for programming knowledge. Unlike DAVID (Huang et al., 2009) and Enrichr (Chen et al., 2013), GANGO + BioFuncional streamlines the analysis of data from multiple experimental groups. Furthermore, it stands out with its integration of AI through CURIE and offers advanced visualization options such as interactive networks and Z-score bar plots, surpassing the basic visualization features of some other tools. The table also indicates whether each tool requires local installation or is accessible via a server or web interface, with GANGO + BioFuncional, DAVID, and Enrichr being server/web-based, while GSEA requires installation

While GSEA (Subramanian et al., 2005) also addresses gene set analysis, GANGO + BioFuncional distinguishes itself by its ability to integrate hierarchical ontology information and generate advanced visualizations such as interactive networks and Z-score bar plots, which enhance the interpretability of results. Furthermore, a notable feature of GANGO + BioFuncional is its integration with artificial intelligence technologies (such as CURIE), facilitating a deeper interpretation of GO terms, an aspect that existing tools do not typically offer. This comparison underscores the unique contributions of GANGO + BioFuncional for functional gene analysis in highly complex scenarios.

Materials and methods

GANGO + BioFunctional is a comprehensive R application, built using the Shiny framework (Chang et al., 2025) (DOWNLOAD AND INSTALL IN https://alexub.shinyapps.io/BioFunctional/). It facilitates the interpretation and visualization of functional analysis related to KEGG pathways and gene ontologies (GO) (Alterovitz et al., 2007; Ashburner et al., 2000; Gene Ontology Consortium, 2015; Kanehisa & Goto, 2000). The application provides researchers with detailed functional information, specifically about biological pathways and gene functions. Utilizing libraries such as Shiny, httr, dplyr, tibble, and rvest in R. GANGO + BioFuncional offers a user-friendly interface for data assessment and analysis. The GANGO + BioFuncional tools have been integrated into this Shiny-based R application.

The development of GANGO stems from prior research detailed in article Monleon-Getino et al. (2020), establishing its foundational components. BioFunctional, on the other hand, represents the significant improvements and enhancements that emerged from subsequent work, specifically those described in article Rodriguez and Monleon-Getino (2024).

The application enables the exploration of KEGG pathways and Gene Ontologies (Gene Ontology Consortium, 2015; Kanehisa & Goto, 2000), facilitating the analysis of complex biological processes. Functions within the application integrate data manipulation and web scraping to extract information from the Kyoto Encyclopedia of Genes and Genomes (KEGG) and QuickGo databases. Parallel processing enhances the efficiency of database queries, enabling rapid results from large datasets.

A key feature is the ability to obtain ancestral information for KEGG pathways and gene ontologies, which simplifies the understanding of their hierarchy and the classification of samples within a dataset. Users can study datasets at different levels of taxonomy directly from raw data. The application also generates interactive networks to visualize relationships between experimental groups and ontologies, preserving classification information. These networks are crucial for understanding the relationships within the displayed system.

These features make the software a valuable tool for analysts studying biological pathways, providing an intuitive interface with advanced data processing techniques. It allows researchers to elucidate the complexity of biological functions and gain insights into gene and molecular component relationships.

What is GANGO?

GANGO is an algorithm (Figure 1,3) that performs enrichment analysis to map genes, taxa, and groups to ontologies. It processes data from text files and generates an information-rich file for the BioFunctional algorithm. This algorithm represents KEGG pathways and ontologies, facilitating functional analysis and interpretation.

Figure 2. Bar Plot of Enriched Gene Ontologies from BioFuncional Analysis. This bar plot visualizes the Gene Ontology (GO) terms, ranked based on their enrichment within specified experimental groups, as determined by the BioFuncional application. The X-axis displays the enriched Gene Ontologies, representing specific biological functions or processes. The Y-axis represents the Z-score transformation, which allows users to visually assess the significance of functional enrichment. Z-scores make it easy to identify the most over-represented (positive Z-scores) or under-represented (negative Z-scores) GO terms in the comparison, specifically between the healthy and bacterial infection groups in this example.

What are ontologies?

In bioinformatics, ontologies are structured vocabularies that standardize the description and classification of biological entities (e.g., genes) and their relationships. They aid in organizing and interpreting complex biological data (Gene Ontology Consortium, 2015).

What is KEGG?

KEGG (Kyoto Encyclopedia of Genes and Genomes) is a database of biological pathways, which are a series of molecular actions within a cell that lead to a specific product or change (Kanehisa & Goto, 2000).

What is enrichment analysis?

Enrichment analysis is a statistical method used to determine if a set of genes is over-represented in a particular category (e.g., a biological pathway or gene ontology term) compared to what would be expected by chance (Subramanian et al., 2005)

What is gene functional analysis?

Gene functional analysis is the process of determining the biological roles of genes, including the study of gene expression, protein interactions, and the effects of gene mutations (Reinitz & Hammer, 2004).

What is BioFunctional?

BioFunctional (Figure 1,4) is a software that extends Gene Ontology (GO) and KEGG pathway (metabolism) enrichment analysis. It generates network and bar plot visualizations and enhances previous gene ontology analysis by incorporating hierarchical information. This enables users to filter relevant GO ontologies or KEGG pathways through statistical analysis and facilitates functional interpretation using curated literature or the Biost3 research group’s artificial intelligence tool, CURIE. CURIE1 is currently under development by our research group and is pending publication (Figure 1, 5). The appendix describes its functionality.

For a detailed understanding of BioFunctional, please refer to “BioFunctional: A Comprehensive App for Interpreting and Visualizing Functional Analysis of KEGG Pathways and Gene Ontologies” (Rodriguez & Monleon-Getino, 2024).

Description of the BioFuncional + GANGO application

The BioFuncional + GANGO application provides a workflow for functional analysis, specifically for elucidating Gene Ontology (GO) terms and KEGG pathway enrichment. This methodology allows researchers to derive biological meaning from gene lists or taxonomic classifications from experiments like transcriptomics or metagenomics. The application comprises two primary modules: GANGO and BioFuncional.

GANGO: Gene ontology and KEGG assignment

The GANGO module processes input data. It accepts a list of genes or taxa, is organized into user-defined groups, and performs the following operations:

BioFuncional: Functional analysis and interpretation

The BioFuncional module processes the output from GANGO to interpret functional enrichment. It performs the following steps:

Results and discussion

In Appendix 1 of the article’s supplementary materials, a detailed, step-by-step graphical tutorial of the GANGO + BioFunctional application’s workflow can be found. This tutorial presents a real-world case study to demonstrate the tool’s comprehensive functionality, specifically focusing on the functional analysis between a group of healthy individuals and a group with bacterial infection. RNASeq data from the study “Dysregulated transcriptional responses to SARS-CoV-2 in the periphery” (McClain et al., 2021) were utilized for this purpose, encompassing samples from subjects affected by bacterial infection versus healthy controls.

The overall workflow detailed in Appendix 1 for the GANGO + BioFunctional + CURIE application involves several key stages: it begins with experimental design and biological sample collection, followed by DNA/RNA extraction and subsequent sequencing. Next, the GANGO module performs bioinformatics procedures to output gene enrichment analysis. This is followed by the BioFunctional module, which handles the visualization and quantification of functional analysis (Gene Ontologies (GO)). Finally, the workflow culminates in biomedical interpretation, which is facilitated through Artificial Intelligence, such as the CURIE.

Implications and novelties

GANGO + BioFuncional offers several key advantages:

Specifically, to highlight the most relevant GO terms, the Enrichment Analysis (EA) values for each GO term are used. These EA values are then transformed into Z-scores.

Z-score transformation

Essentially, BioFuncional’s bar plots (Figure 2) concisely represent the most relevant GO terms, categorized by GO type. Specifically, the Enrichment Analysis (EA) values for each GO term are transformed into Z-scores to highlight significance. This standardization allows for easy comparison of the relative importance of different GO terms.

A Z-score measures how far a data point is from the mean (average) of the dataset. It’s calculated in units of standard deviations.

In this context, transforming the EA values to Z-scores allows us to:

These features make GANGO + BioFuncional a valuable tool for researchers in biology, genetics, and related fields.

Beyond transcriptomics: Expanding applicability to other omics data

GANGO + BioFuncional has been developed with a primary focus on functional gene analysis, particularly for data derived from transcriptomics. While GANGO is not directly designed to process raw data from other omics types such as proteomics or metabolomics, its modular architecture allows for broader applicability.

Users working with proteomics or metabolomics data would first need to translate their findings into relevant Gene Ontologies (GO) or KEGG pathways and then use BioFuncional to present ontologies, conveniently select them, and interpret them using CURIE. This can be achieved through established bioinformatics workflows in R, for instance, by using database searches (e.g., with Bioconductor packages like org.Hs.eg.db for gene IDs or clusterProfiler for enrichment analysis) to map protein identifiers or metabolites to their corresponding genes, and then performing enrichment analysis to obtain GO terms or KEGG pathways. Subsequently, these pre-processed GO or KEGG terms, when formatted appropriately, can be seamlessly integrated into BioFuncional for advanced visualization, hierarchical analysis, and AI-driven interpretation, thereby leveraging the full capabilities of our tool for a wider range of biological insights.

Accuracy and reliability

The accuracy and reliability of GANGO + BioFuncional’s results are ensured through several key design principles and integrated features, including its robust statistical foundation (with Z-score transformation), its ability to integrate complex hierarchical and multi-group data, and a user-friendly design that minimizes errors and facilitates comprehensive data visualization and expert validation:

Conclusions

In summary, GANGO + BioFuncional provides a significant advancement in functional analysis for high-throughput biological data. By integrating efficient data processing with robust analytical and visualization capabilities, this application empowers researchers to streamline complex analyses, enhance the interpretability of their findings, and ultimately derive more meaningful biological insights. Its user-friendly design, capacity to handle multi-group studies, and compatibility with AI-driven interpretation tools make it a valuable asset for a wide range of biological research, facilitating a deeper understanding of gene function and its implications.

Appendix A. Supplementary material

Supplementary material associated with this article can be found on https://doi.org/10.62063/ecb-63. To access the supplementary material, please visit the article landing page.

Funding

There is no funding to declare.

Conflict of interest

The authors declare no conflict of interest.

Data availability statement

All necessary files for the analysis are provided in the correct format and can be download in https://github.com/amonleong/Biofunctional

Ethics committee approval

Ethics committee approval is not required for this study.

Authors’ contribution statement

The authors acknowledge their contributions to this paper as follows: Study conception and design: AR, TM; Data collection: AR, XT; Analysis and interpretation of results: XT, GC, JM; Manuscript draft preparation: TM, AR, GC. All authors reviewed the results and approved the final version of the manuscript.

Footnotes

Use of Artificial Intelligence: No artificial intelligence-based tools or applications were used in the preparation of this study. The entire content of the study was produced by the author(s) in accordance with scientific research methods and academic ethical principles.

1CURIE is not directly integrated with GANGO + BioFunctional. Instead, it is an independent computational application currently undergoing testing on the research group’s computational server. While GANGO + BioFunctional results are formatted to be readily interpretable by AI tools like CURIE, CURIE itself is presently only available within the University of Barcelona’s facilities.

REFERENCES

Alterovitz, G., Xiang, M., Mohan, M., & Ramoni, M. F. (2007). GO PaD: the Gene Ontology Partition Database. Nucleic acids research, 35(Database issue), D322–D327. 10.1093/nar/gkl799

Ashburner, M., Ball, C. A., Blake, J. A., Botstein, D., Butler, H., Cherry, J. M., Davis, A. P., Dolinski, K., Dwight, S. S., Eppig, J. T., Harris, M. A., Hill, D. P., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J. C., Richardson, J. E., Ringwald, M., Rubin, G. M., & Sherlock, G. (2000). Gene ontology: Tool for the unification of biology. Nature Genetics, 25(1), 25–29. 10.1038/75556

Chang, W., Cheng, J., Allaire, J., Sievert, C., Schloerke, B., Xie, Y., Allen, J., McPherson, J., Dipert, A., & Borges, B. (2025). shiny: Web Application Framework for R (R package version 1.10.0.9001) https://github.com/rstudio/shiny

Chen, E. Y., Tan, C. M., Kou, Y., Duan, Q., Wang, Z., & Ma’ayan, A. (2013). Enrichr: Interactive and collaborative HTML5 gene list enrichment analysis tool. BMC Bioinformatics, 14(1), 128. 10.1186/1471-2105-14-128

Gene Ontology Consortium (2015). Gene Ontology Consortium: going forward. Nucleic acids research, 43(Database issue), D1049–D1056. 10.1093/nar/gku1179

Huang, D. W., Sherman, B. T., & Lempicki, R. A. (2009). Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nature Protocols, 4(1), 44–57. 10.1038/nprot.2008.211

Kanehisa, M., & Goto, S. (2000). KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Research, 28(1), 27–30. 10.1093/nar/28.1.27

McClain, M. T., Constantine, F. J., Henao, R., Liu, Y., Tsalik, E. L., Burke, T. W., Steinbrink, J. M., Petzold, E., Nicholson, B. P., Rolfe, R., Kraft, B. D., Kelly, M. S., Saban, D. R., Yu, C., Shen, X., Ko, E. M., Sempowski, G. D., Denny, T. N., Ginsburg, G. S., & Woods, C. W. (2021). Dysregulated transcriptional responses to SARS-CoV-2 in the periphery. Nat Commun 12, 1079. 10.1038/s41467-021-21289-y

Monleon-Getino, A., Paytuví-Gallart, A., Sanseverino, W., & Méndez, J. A. (2020). A new bioinformatic tool to interpret metagenomics/metatranscriptomics results based on the geometry of the clustering network and its differentially gene ontologies (GANGO) [Preprint]. bioRxiv. 10.1101/2020.06.10.140103

Reinitz, J., & Hammer, M. (2004). A computational approach to gene functional analysis: Gene ontology, sequence motifs, and expression data. Methods in Cell Biology, 77, 1–23.

Rodriguez, A., & Monleon-Getino, A. (2024). BioFunctional: A comprehensive app for interpreting and visualizing functional analysis of KEGG pathways and gene ontologies [Preprint]. bioRxiv. 10.1101/2024.10.08.616405

Subramanian, A., Tamayo, P., Mootha, V. K., Mukherjee, S., Ebert, M. A., Gillette, M. A., Paulovich, A., Pomeroy, S. L., Golub, T. R., Lander, E. S., & Mesirov, J. P. (2005). Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences, 102(43), 15545–15550. 10.1073/pnas.0506580102

Thomas, D. A. (2000). Functional genomics: A user’s guide to the Rosetta Stone of gene function. Genome Biology, 1(3), reviews1003.1–reviews1003.7. 10.1186/gb-2000-1-3-reviews1003