PhageLeads: Rapid Assessment of Phage Therapeutic Suitability Using an Ensemble Machine Learning App
PhageLeads is a new tool for rapid assessment of phage therapeutic suitability based on 3 checkpoints: 1) the presence of temperate markers, 2) antimicrobial resistance (AMR) genes and 3) virulence genes.
PhageLeads was developed as a collaborative effort between the GLOBE institute and the center for Hologenomics at the University of Copenhagen, the Phage Therapy center at Leicester University, and the Centre for omics driven computational biodiscovery at AIMST University.
The characterization of therapeutic phage genomes plays a crucial role in the success rate of phage therapies. There are three checkpoints that need to be examined for the selection of phage candidates, namely, the presence of temperate markers, antimicrobial resistance (AMR) genes, and virulence genes. However, currently, no single-step tools are available for this purpose. Hence, we have developed a tool capable of checking all three conditions required for the selection of suitable therapeutic phage candidates. This tool consists of an ensemble of machine-learning-based predictors for determining the presence of temperate markers (integrase, Cro/CI repressor, immunity repressor, DNA partitioning protein A, and antirepressor) along with the integration of the ABRicate tool to determine the presence of antibiotic resistance genes and virulence genes. Using the biological features of the temperate markers, we were able to predict the presence of the temperate markers with high MCC scores (>0.70), corresponding to the lifestyle of the phages with an accuracy of 96.5%. Additionally, the screening of 183 lytic phage genomes revealed that six phages were found to contain AMR or virulence genes, showing that not all lytic phages are suitable to be used for therapy. The suite of predictors, PhageLeads, along with the integrated ABRicate tool, can be accessed online for in silico selection of suitable therapeutic phage candidates from single genome or metagenomic contigs.
By utilizing the protein features of these temperate markers, PhageLeads was able to predict the lifestyle of phages with high accuracy (96.2%). PhageLeads consists of five individual temperate protein predictors for the temperate markers, which predict the presence of these markers in phage genomes. Based on the presence of either one or multiple markers, we were able to effectively classify the lifestyle of phages (lytic or temperate). Additionally, the lytic phage genomes were screened using ABRicate tool, in which some lytic phages were found to encode antimicrobial resistance and virulence proteins, deeming them unsafe for phage therapy. PhageLeads was able to predict the presence of temperate markers in a single phage in 1.6 s on average and was able to detect the resistance and virulence genes in an average of 8.1 s, compared to 2.3 s taken for BACPHLIP to predict the lifestyle of a single phage. PhageLeads is available as an online tool at www.phageleads.dk as a part of the PhageCompass consortium (www.phagecompass.dk), making it easily accessible for researchers and as an effective tool for determining the suitability of phage for therapeutic use. Additionally, PhageLeads can also be used for predicting the presence of lysogenic markers for metagenomic contigs.