BACKGROUND : Metagenomics allows unprecedented access to uncultured environmental microorganisms. The
analysis of metagenomic sequences facilitates gene prediction and annotation, and enables the assembly of draft
genomes, including uncultured members of a community. However, while several platforms have been developed
for this critical step, there is currently no clear framework for the assembly of metagenomic sequence data.
RESULTS : To assist with selection of an appropriate metagenome assembler we evaluated the capabilities of nine
prominent assembly tools on nine publicly-available environmental metagenomes, as well as three simulated
datasets. Overall, we found that SPAdes provided the largest contigs and highest N50 values across 6 of the 9
environmental datasets, followed by MEGAHIT and metaSPAdes. MEGAHIT emerged as a computationally
inexpensive alternative to SPAdes, assembling the most complex dataset using less than 500 GB of RAM
and within 10 hours.
CONCLUSIONS : We found that assembler choice ultimately depends on the scientific question, the available
resources and the bioinformatic competence of the researcher. We provide a concise workflow for the
selection of the best assembly tool.