We are about to download and analyse all publicly available metagenomic datasets in the world, a sequel to a previous project doing the same thing. But in the past few years, the amount of publicly available metagenomic datasets has increased by ~75%, challenging the bioinformatic analysis by the sheer size and amount of data.
Additionally, we have decided to perform more assemblies of this data than last time, where we assembled a total of 889 metagenomes (selected due to the presence of mcr-variants).
Assemblies pose as the major bottleneck in most bioinformatic pipelines, not only due to large memory requirements but also in terms of runtime. For metagenomic samples, this is no different, and we thus cannot just assemble everything blindly. As we, in the group, have a preference for AMR, we have decided to target the AMR-containing regions in the metagenomes and make assemblies around these.
What we have presented here is the current status on whole metagenomic assemblies and why they pose a problem, together with known solutions to perform "targeted assemblies" of AMR-containing regions. Unfortunately, neither of the solutions existing today comes with satisfying analysis performance, both on the quality of the analysis and computational resources needed to perform the task.
As always, time is against us. But we have managed to put together a new pipeline solving this exact problem, together with some specialised c-programming to get around the worst bottlenecks.
As a result we can proudly present a pipeline to perform targeted assemblies against AMR-containing regions that is capable of analysing all publicly available metagenomes in due time and with satisfying results.
Philip Thomas Lanken Conradsen Clausen’s presentation