• Bent Petersen

PhageClouds for Fast Comparison of ∼640,000 Phage Genomic Sequences Using Genomic Network Graphs

Have you ever wanted to compare your newly assembled phage genome against 640.000 phage genomic sequences? Then you should check out our newest publication and try our tool which is freely available at www.phageclouds.dk


Here we present PhageClouds, a novel resource that will facilitate the analysis of phage genomic sequences and the characterization of assembled phage genomes.


PhageClouds is a computational resource that exploits the power and versatility of graph databases to offer a way of exploring the phage genomic diversity encompassed by published viromes. This tool will enable scientists to analyze their complete and draft phage genomes easily and efficiently by comparing them to a massive data set of ∼640,000 reference phage sequences. In addition, PhageClouds is suitable for exploring phage diversity under a host-centric perspective, facilitating the identification of different groups of phages with different infection strategies.


Below are some of the highlights.


Fast and structured illustration of host-specific phage sequence space


One of the main functionalities offered by PhageClouds is the ability to explore the phage genomic sequence space associated with a user-defined host. This approach will allow users to identify clusters of closely related phages that target a specific host, which provides an opportunity to rapidly explore and get a visual overview of the known phage space and diversity associated with that host.



Clouds of phages targeting Pseudomonas.

Rapid search of phage clouds related to user-defined query phages


Our graph database allows users to analyze a custom set of query phages to identify all phage clouds that they are associated with.


Searching phage clouds for a set of input query phages and user-defined intergenomic distance thresholds.

Phage clouds reflect taxonomic groups of phages


To demonstrate that the phage clouds reliably represent groups of closely related phages, we analyzed clouds that included phages of the family Herelleviridae. Until recently, members of this family were classified in the Spounavirinae subfamily within the family Myoviridae, but a series of complementary genomic/proteomic analyses demonstrated that the spounaviruses were markedly distinct from other members of the family Myoviridae. Therefore, those analyses supported the creation of the family Herelleviridae and the definition of its internal structure, meaning that subfamilies and genera within it have been defined based on genomic/proteomic relationships between member phages. Thus, we considered that this family would be an ideal example to illustrate that phage clouds depict genuine relationships between closely related phages.



Clouds extracted with an intergenomic distance threshold of 0.15 and that contain at least one known member of the family Herelleviridae

The figure illustrates many examples of phage genomic sequences from the other data sets in our graph database that connect to clouds representing a variety of genera within the family Herelleviridae. Among these are 31 entries from IMG/VR that are indeed currently classified as members of this family. The presence of the remaining phage genomic sequences in the retrieved clouds suggests that they could be members of the corresponding genera within the family Herelleviridae.


It has to be noted that PhageClouds was not designed to be a tool for taxonomic classification of phage sequences. Based on the search of Herelleviridae clouds described in the paper, it seems feasible that PhageClouds could help with the identification of phage clusters that correspond to taxa at the species or genus rank


Conclusion

PhageClouds is a computational resource that exploits the power and versatility of graph databases to offer a way of exploring the phage genomic diversity encompassed by published viromes. This tool will enable scientists to analyze their complete and draft phage genomes easily and efficiently by comparing them to a massive data set of ∼640,000 reference phage sequences. In addition, PhageClouds is suitable for exploring phage diversity under a host-centric perspective, facilitating the identification of different groups of phages with different infection strategies.


PhageClouds is hosted on our online server and is accessible from any web browser, and thus, users of this resource do not require any experience running software from the command line. PhageClouds is part of our online infrastructure at https://www.phagecompass.dk and https://www.phageclouds.dk. The tool accepts phage genomic sequences in FASTA format as input to query the graph database and allows users to examine precalculated phage clouds filtered by a specific host.


The work done by G.R.-P. was funded by the Foundation Idella project ‘‘Advancing the Medical Future Through Unlocking the Complex Genetic Material of the Amazon Rainforest.’’


Check out the paper - it is published under Open Access .


999 views0 comments

Recent Posts

See All