Dereplicate
This workflow automates the dereplication of your collection of genomes, Metagenome-Assembled Genomes (MAGs), or Single-Cell Amplified Genomes (SAGs).
To execute the workflow, run:
For additional options, run:
Importantly, the dereplication can be performed on ANI (default) or AAI (passing the --aai
flag) at a given threshold (by default 95%) that can be modified with the flag --threshold
. Finally, the representative genomes can be selected to reflect the highest genome quality (default) or to be the most "central" genome in the clade in ANI or AAI space (passing the --medoids
flag).
Expected output
Once your run is complete, you may expect the standard summaries for cds
, assembly
, and essential_genes
, as well as a table (genomospecies.tsv
) with three columns: (1) a clade name, (2) the name of the representative genome, and (3) the names of all the members in the clade separated by commas. Additionally you can expect the subdirectory representatives
including assemblies (FastA files, nucleotides) of all representative genomes. This is the dereplicated set of genomes.
Last updated