Species identification in metagenomic samples

Species identification in metagenomic samples

Presentation from the Research Group for Genomic Epidemiology – 20 February 2023

Choosing the correct software tool for classifying the species in a metagenomic sample is crucial. Equally important is to use a suitable reference database for the task at hand. Even the most accurate method can yield unsatisfactory results if the reference database is not complete or not curated well.

I tested a highly sensitive mapping tool, KMA, with different reference sets on Illumina and Nanopore sequencing data from a Zymo mock community standard. A single-copy marker gene based database derived from the Genome Taxonomy Database performed best for both Illumina and Nanopore data, balancing few false negatives and false positives. For Illumina data, a database of complete chromosomal genomes performed equally well, owning to the fact, that the mock community contained mainly well studied bacterial species.

Judit Szarvas’ presentation