Microbial communities residing within the human gut have a strong influence on human health and drive gastrointestinal diseases such as inflammatory bowel disease (IBD) [1]. Viruses infecting bacteria, known as bacteriophages, play a key role in modulating bacterial communities residing within the human gut [2]. However, the identification and characterization of novel bacteriophages in metagenomes remains a challenge.
With the development of high-throughput sequencing techniques, many tools have been developed to identify viral sequences from metagenomic data [3]. Generating high-quality phage genomes via de novo metagenome assembly is challenging due to the modular and mosaic nature of phage genomes resulting in fragmented assemblies [4-6]. Hence, current computational tools for identifying phage sequences from metagenomic data rely on similarities between sequences, nucleotide composition, and the presence of phage genes/proteins [7]. Despite the recent progress, it is challenging for currently available tools to recover complete high-quality phage genomes from fragmented metagenomic assemblies, and a novel approach is required to address this issue.
This work introduces a novel software tool named Phables [8], which uses assembly graphs built by metagenomic assemblers [9] to identify high-quality bacteriophage genomes from fragmented metagenomic assemblies. The assembly information can help the identification of sequences belonging to a genome. This study demonstrates that the proposed method improves the identification of high-quality and complete bacteriophage genomes while outperforming state-of-the-art methods. Additionally, more than 80% of the bacteriophage genomes identified from IBD patient samples were of high quality. This work has enabled the identification of novel bacteriophage genomes associated with IBD and provides insights into the diversity of bacteriophage communities in IBD metagenomes. Overall, this work can advance our understanding of the role of bacteriophages in regulating gut microbial communities, which has significant implications for developing new diagnostic and therapeutic strategies for gastrointestinal diseases.