Metadata

All objects

All metadata objects support the following fields:

Field

Supported values

Description

created*

Date

Date of creation

updated*

Date

Date of last update

* Mandatory

Projects

The following metadata fields are recognized by different interfaces for Projects:

Project Features

Metadata with additional information and features about the project:

Field

Supported values

Description

comments

String

Free-form comments

description

String

Free-form description

name*

Name

Name‡

* Mandatory

Project System Metadata

Metadata entries automatically set by MiGA:

Field

Supported values

Description

datasets*

Array of String

List of datasets in the project

type*

String

Type

* Mandatory

By default the base name of the project path

Project Flags

Metadata entries that trigger specific behaviors in MiGA:

Field

Supported values

Description

ref_project

Path

Project with reference taxonomy

db_proj_dir

Path

Directory containing database projects

tax_pvalue

Float [0,1]

Max p-value to transfer taxonomy (def: 0.05)

aai_p

String

Value of aai.rb -p° on AAI (def: blast+)

haai_p

String

Value of aai.rb -p° on hAAI (def: blast+)

ani_p

String

Value of ani.rb -p° on ANI (def: blast+)

max_try

Integer

Max number of task attempts (def: 10)

aai_save_rbm

Boolean

Should RBMs be saved for OGS analysis?

ogs_identity

Float [0,100]

Min RBM identity for OGS (def: 80)

clean_ogs

Boolean

If false, keeps ABC (clades only)

run_clades

Boolean

Should clades be estimated from distances?

gsp_ani

Float [0,100]

ANI limit to propose gsp clades (def: 90)

gsp_aai

Float [0,100]

AAI limit to propose gsp clades (def: 95)

gsp_metric

String

Metric to propose clades: ani (def), aai

ess_coll

String

Collection of essential genes to use+

min_qual

Float (or 'no')

Min. genome quality (or no filter; def: 25)

° By default: blast+. Other supported values: blast, blat, diamond (except for ANI), and fastani (only for ANI), no (only for hAAI). If using diamond and/or fastani, the corresponding software must be installed. Important: These defaults will change in v1.0 to: blast+ for hAAI, diamond for AAI, and fastani for ANI.

+ One of: dupont_2012 (default), or lee_2019

Project Hooks

Additionally, hooks can be defined for projects as arrays of arrays containing the action name and the arguments (if any). For example, one can define:

on_processing_ready: [
['run_cmd', 'date > {{project}}/ALL_DONE.txt'],
['run_cmd', 'sendmail ...']
]

or

on_add_dataset: [
['run_cmd', 'echo {{object}} > {{project}}/LATEST_DATASET.txt']
]

Supported events:

  • on_create(): When created

  • on_load(): When loaded

  • on_save(): When saved

  • on_add_dataset(object): When a dataset is added, with name object

  • on_unlink_dataset(object): When dataset with name object is unlinked

  • on_result_ready(object): When any result is ready, with key object

  • on_result_ready_{result}(): When result is ready

  • on_processing_ready(): When processing is complete

Supported hooks:

  • run_lambda(lambda, args...)

  • run_cmd(cmd)

Datasets

The following metadata fields are recognized by different interfaces for Datasets:

Dataset Features

Metadata with additional information and features about the dataset:

Field

Supported values

Description

tax

MiGA::Taxonomy

Taxonomy of the dataset

quality

String

Description of genome quality

dprotologue

String

Taxonumber in the Digital Protologue DB

ncbi_tax_id

String

Linking ID(s)‡ for NCBI Taxonomy

ncbi_nuccore

String

Linking ID(s)‡ for NCBI Nucleotide

ncbi_asm

String

Linking ID(s)‡ for NCBI Assembly

ebi_embl

String

Linking ID(s)‡ for EBI EMBL

ebi_ena

String

Linking ID(s)‡ for EBI ENA

web_assembly

String

URL to download assembly

web_assembly_gz

String

URL to download gzipped assembly

see_also

String

Link(s)‡ in the format text:url

is_type

Boolean

If it is type material

is_ref_type

Boolean

If it is reference material°

type_rel

String

Relationship to type material

suspect

Array(String)

Flags indicating a suspect dataset

Multiple values can be provided separated by commas or colons

° This is not a valid type, but it represents the closest available dataset to material that is unavailable and unlikely to ever become available. See also Federhen, 2015, NAR

Dataset System Metadata

Metadata entries automatically set by MiGA:

Field

Supported values

Description

type*

String

Type

ref

Boolean

Reference

inactive

Boolean

If auto-processing should stop

metadata_only

Boolean

Dataset with metadata but without input data

status

String

Proc. status: complete, incomplete, inactive

_step

String

For internal control of processing

_try_step

Integer

For internal control of processing

user

String

Deprecated

* Mandatory

Dataset Flags

Metadata entries that trigger specific behaviors in MiGA:

Field

Supported values

Description

run_step

Boolean

Forces running or not step

db_project

Path

Project to use as database

dist_req

Array of String

Run distances against these datasets*

* When searching best-matching datasets, include these datasets even if they are not visited using the medoid tree

Dataset Hooks

Additionally, hooks can be defined for datasets as arrays of arrays containing the action name and the arguments. See above (project hooks) for examples.

Supported events:

  • on_load(): When loaded

  • on_save(): When saved

  • on_remove(): When removed

  • on_inactivate(): When inactivated

  • on_activate(): When activated

  • on_result_ready(object): When any result is ready, with key object

  • on_result_ready_{result}(): When result is ready

  • on_preprocessing_ready(): When preprocessing is complete

Supported hooks:

  • run_lambda(lambda, args...)

  • clear_run_counts()

  • run_cmd(cmd)