Roseburia

Presentation from the Research Group for Genomic Epidemiology – 15 August 2022

Single Cell Metagenomics

Traditional metagenomics shotgun sequencing relies heavily on in-silico binning, which often results in gene traceability being a hypothesis more than a certainty. Improvements in microfluidics technology now enable the encapsulation of environmental microbes for downstream sequencing. This major technical improvement allows this in-silico hypothesis to become more of a physical certainty. With a focus on better understanding of the mechanisms of AMR, we develop an bioinformatics workflow that aims at benchmarking and exploring the potential of single cell metagenomics sequencing. We also aim at investigating improved genome quality to enhance genomic databases in the future as well as potentially discovering novel microbial species to complete the said databases.

We produced two sequencing datasets: one pilot and one spike, they respectively contain three and two environmental samples (Bangladesh and Chad sewage as well as pig feces from Danish farm). Regarding the pilot dataset, each sample was sequenced twice: a deep and a shallow run. The deep run consists of about 600 encapsulated microbes with more that 100k reads each and the shallow run represents up to 3000 microbes, each above a 10k reads per droplet threshold. Secondly, to benchmark this novel sequencing technology, a spike dataset was generated. It is made of 2 samples (Bangladesh sewage and pig feces) where the same mock bacterial community (25 species at controlled concentration) was inserted.

To assign a taxon to each microbe we map the reads within one droplet using a k-mer alignment algorithm (KMA) to a multi marker gene database (GTDB). Subsequently, we apply a similar process to identify ARGs, replacing the GTDB database with the ResFinder database. As a result, we are able to assign a taxon for about 2/3 of the sequenced droplets as well as associate each of them to acquired AMR genes. Looking at rarefaction curves we find that we require more droplets to get a full overview of the resistome and bacterial composition of the sampled environment.

Regarding genome quality we test assembly of deep sequenced droplets (> 100k reads) and co-assembly (i.e. pooling the droplets assigned to the same taxon together) of shallow sequenced microbes using a single cell assembler (SPAdes). We subsequently assess the resulting genomes with checkM and by aligning them (using mauve) to a reference. We find that both with a deep sequenced microbe or co-assembled microbes the genomes assemblies are partial and slightly contaminated. The sparsity of the resulting genomes is partly explained by the Multiple Displacement Amplification (MDA) process that results in different amplification depth in some part of the genome.

We conclude that single cell metagenomics is extremely promising and has to potential to become a better tool for global surveillance of AMR and improving our understanding of microbial communities.

Baptiste Jacques Philippe Avot's presentation