This workflow automates the dereplication of your collection of genomes, Metagenome-Assembled Genomes (MAGs), or Single-Cell Amplified Genomes (SAGs).
To execute the workflow, run:
miga derep_wf -o my_project path/to/mags/*.fasta
For additional options, run:
miga derep_wf -h
Importantly, the dereplication can be performed on ANI (default) or AAI (passing the
--aaiflag) at a given threshold (by default 95%) that can be modified with the flag
--threshold. Finally, the representative genomes can be selected to reflect the highest genome quality (default) or to be the most "central" genome in the clade in ANI or AAI space (passing the
Once your run is complete, you may expect the standard summaries for
essential_genes, as well as a table (
genomospecies.tsv) with three columns: (1) a clade name, (2) the name of the representative genome, and (3) the names of all the members in the clade separated by commas. Additionally you can expect the subdirectory
representativesincluding assemblies (FastA files, nucleotides) of all representative genomes. This is the dereplicated set of genomes.