Antimicrobial resistance gene (ARG) detection from sequencing data depends strongly on the reference database used. In this master’s thesis project, six ARG database sources represented in PanRes were compared across metagenomic and whole-genome sequencing benchmarking datasets, including global sewage, infant and maternal faeces, cropland soil, grassland soil, and clinical bacterial isolates from the One Day in Denmark project.
The analysis evaluated how database choice influenced bacterial-normalised ARG abundance, AMR class composition, sample clustering patterns, associations between resistome and bacterial community structure, and MIC–WGS genotype–phenotype concordance. Across datasets, database choice affected the scale and strength of resistome profiles, but major biological patterns were often still visible. Global sewage samples retained broad spatial structure across databases, infant faecal samples showed a robust age-associated resistome pattern, and soil datasets showed broadly similar ARG abundance patterns across database subsets but did not reproduce the strong drought enrichment reported previously from selected CARD-based comparisons. In clinical isolates, the ResFinder subset reproduced concordance close to the original ResFinder-based analysis, while broader databases detected more ARG clusters and produced more WGS-R/MIC-S discordances.
Overall, the project highlights that ARG databases are not interchangeable: database choice can influence quantitative interpretation, ecological structure, and genotype–phenotype prediction, even when broad biological signals remain detectable.
Anas El Youssef’s presentation