Metadata

All objects

All metadata objects support the following fields:

FieldSupported valuesDescription

created*

Date of creation

updated*

Date of last update

* Mandatory

Projects

The following metadata fields are recognized by different interfaces for Projects:

Project Features

Metadata with additional information and features about the project:

FieldSupported valuesDescription

comments

String

Free-form comments

description

String

Free-form description

name*

Name‡

* Mandatory

Project System Metadata

Metadata entries automatically set by MiGA:

FieldSupported valuesDescription

datasets*

Array of String

List of datasets in the project

type*

String

* Mandatory

By default the base name of the project path

Project Flags

Metadata entries that trigger specific behaviors in MiGA:

FieldSupported valuesDescription

ref_project

Path

Project with reference taxonomy {1}

db_proj_dir

Path

Directory containing database projects {1} {2}

tax_pvalue

Float [0,1]

Max p-value to transfer taxonomy (def: 0.1)

haai_p

String

hAAI engine {3} (def: fastaai)

aai_p

String

AAI engine {3} (def: diamond)

ani_p

String

ANI engine {3} (def: fastani)

max_try

Integer

Max number of task attempts (def: 10)

aai_save_rbm

Boolean

Should RBMs be saved for OGS analysis?

ogs_identity

Float [0,100]

Min RBM identity for OGS (def: 80)

clean_ogs

Boolean

If false, keeps ABC (clades only)

run_clades

Boolean

Should clades be estimated from distances?

gsp_ani

Float [0,100]

ANI limit to propose gsp clades (def: 95)

gsp_aai

Float [0,100]

AAI limit to propose gsp clades (def: 90)

gsp_metric

String

Metric to propose clades: ani (def), aai

ess_coll

String

Collection of essential genes to use {4}

min_qual

Float (or 'no')

Min. genome quality (or no filter; def: 25)

distances_checkpoint

Integer

Comparisons before storing data (def: 10)

{1} This path can be either absolute or relative to the project's path.

{2} This is the location of the databases used by db_project. If not set, it is assumed to be the parent folder of the current project.

{3} Supported values: blast, blat, diamond (only for hAAI and AAI), fastani (only for ANI), no (only for hAAI), and fastaai (only for hAAI).

{4} One of: dupont_2012 (default), or lee_2019

Project Hooks

Additionally, hooks can be defined for projects as arrays of arrays containing the action name and the arguments (if any). For example, one can define:

on_processing_ready: [
  ['run_cmd', 'date > {{project}}/ALL_DONE.txt'],
  ['run_cmd', 'sendmail ...']
]

or

on_add_dataset: [
  ['run_cmd', 'echo {{object}} > {{project}}/LATEST_DATASET.txt']
]

Supported events:

  • on_create(): When created

  • on_load(): When loaded

  • on_save(): When saved

  • on_add_dataset(object): When a dataset is added, with name object

  • on_unlink_dataset(object): When dataset with name object is unlinked

  • on_result_ready(object): When any result is ready, with key object

  • on_result_ready_{result}(): When result is ready

  • on_processing_ready(): When processing is complete

Supported hooks:

  • run_lambda(lambda, args...)

  • run_cmd(cmd)

Datasets

The following metadata fields are recognized by different interfaces for Datasets:

Dataset Features

Metadata with additional information and features about the dataset:

FieldSupported valuesDescription

tax

MiGA::Taxonomy

Taxonomy of the dataset

quality

String

Description of genome quality

trna_count

Integer

Number of tRNA elements detected

trna_aa

Integer

Number of distinct AA with tRNA elements

dprotologue

String

Taxonumber in the Digital Protologue DB

ncbi_tax_id

String

Linking ID(s) {1} for NCBI Taxonomy

ncbi_nuccore

String

Linking ID(s) {1} for NCBI Nucleotide

ncbi_asm

String

Linking ID(s) {1} for NCBI Assembly

ebi_embl

String

Linking ID(s) {1} for EBI EMBL

ebi_ena

String

Linking ID(s) {1} for EBI ENA

web_assembly

String

URL to download assembly

web_assembly_gz

String

URL to download gzipped assembly

see_also

String

Link(s) {1} in the format text:url

is_type

Boolean

If it is type material

is_ref_type

Boolean

If it is reference material {2}

type_rel

String

Relationship to type material

suspect

Array(String)

Flags indicating a suspect dataset

{1} Multiple values can be provided separated by commas or colons

{2} This is not a valid type, but it represents the closest available dataset to material that is unavailable and unlikely to ever become available. See also Federhen, 2015, NAR

Dataset System Metadata

Metadata entries automatically set by MiGA:

FieldSupported valuesDescription

type*

String

ref

Boolean

inactive

Boolean

If auto-processing should stop

metadata_only

Boolean

Dataset with metadata but without input data

status

String

Proc. status: complete, incomplete, inactive

_step

String

For internal control of processing

_try_step

Integer

For internal control of processing

user

String

Deprecated

* Mandatory

Dataset Flags

Metadata entries that trigger specific behaviors in MiGA:

FieldSupported valuesDescription

run_step

Boolean

Forces running or not step

db_project

Path

Project to use as database

dist_req

Array of String

Run distances against these datasets*

* When searching best-matching datasets, include these datasets even if they are not visited using the medoid tree

Dataset Hooks

Additionally, hooks can be defined for datasets as arrays of arrays containing the action name and the arguments. See above (project hooks) for examples.

Supported events:

  • on_load(): When loaded

  • on_save(): When saved

  • on_remove(): When removed

  • on_inactivate(): When inactivated

  • on_activate(): When activated

  • on_result_ready(object): When any result is ready, with key object

  • on_result_ready_{result}(): When result is ready

  • on_preprocessing_ready(): When preprocessing is complete

Supported hooks:

  • run_lambda(lambda, args...)

  • clear_run_counts()

  • run_cmd(cmd)

Last updated