# Metadata

## All objects

All metadata objects support the following fields:

|     Field |                            Supported values                            | Description         |
| --------: | :--------------------------------------------------------------------: | ------------------- |
| created\* | [Date](https://manual.microbial-genomes.org/part1/glossary#miga-dates) | Date of creation    |
| updated\* | [Date](https://manual.microbial-genomes.org/part1/glossary#miga-dates) | Date of last update |

> **\*** Mandatory

## Projects

The following metadata fields are recognized by different interfaces for **Projects**:

### Project Features

Metadata with additional information and features about the project:

|       Field |                            Supported values                            | Description           |
| ----------: | :--------------------------------------------------------------------: | --------------------- |
|    comments |                                 String                                 | Free-form comments    |
| description |                                 String                                 | Free-form description |
|      name\* | [Name](https://manual.microbial-genomes.org/part1/glossary#miga-names) | Name‡                 |

> **\*** Mandatory

### Project System Metadata

Metadata entries automatically set by MiGA:

|      Field | Supported values | Description                                                            |
| ---------: | :--------------: | ---------------------------------------------------------------------- |
| datasets\* |  Array of String | List of datasets in the project                                        |
|     type\* |      String      | [Type](https://manual.microbial-genomes.org/part2/types#project-types) |

> **\*** Mandatory
>
> **‡** By default the base name of the project path

### Project Flags

Metadata entries that trigger specific behaviors in MiGA:

|                 Field | Supported values | Description                                    |
| --------------------: | :--------------: | ---------------------------------------------- |
|          ref\_project |       Path       | Project with reference taxonomy {1}            |
|         db\_proj\_dir |       Path       | Directory containing database projects {1} {2} |
|           tax\_pvalue |   Float \[0,1]   | Max p-value to transfer taxonomy (def: 0.1)    |
|               haai\_p |      String      | hAAI engine {3} (def: fastaai)                 |
|                aai\_p |      String      | AAI engine {3} (def: diamond)                  |
|                ani\_p |      String      | ANI engine {3} (def: fastani)                  |
|              max\_try |      Integer     | Max number of task attempts (def: 10)          |
|        aai\_save\_rbm |      Boolean     | Should RBMs be saved for OGS analysis?         |
|         ogs\_identity |  Float \[0,100]  | Min RBM identity for OGS (def: 80)             |
|            clean\_ogs |      Boolean     | If false, keeps ABC (clades only)              |
|           run\_clades |      Boolean     | Should clades be estimated from distances?     |
|              run\_ogs |      Boolean     | Should orthologous groups be estimated?        |
|              gsp\_ani |  Float \[0,100]  | ANI limit to propose gsp clades (def: 95)      |
|              gsp\_aai |  Float \[0,100]  | AAI limit to propose gsp clades (def: 90)      |
|           gsp\_metric |      String      | Metric to propose clades: `ani` (def), `aai`   |
|             ess\_coll |      String      | Collection of essential genes to use {4}       |
|             min\_qual |  Float (or 'no') | Min. genome quality (or no filter; def: 25)    |
| distances\_checkpoint |      Integer     | Comparisons before storing data (def: 10)      |

> **{1}** This path can either be absolute or relative to the project's path
>
> **{2}** This is the location of the databases used by [db\_project](#dataset-flags). If not set, it is assumed to be the parent folder of the current project
>
> **{3}** Supported values: `blast`, `blat`, `diamond` (only for hAAI and AAI), `fastani` (only for ANI), `no` (only for hAAI and AAI), and `fastaai` (only for hAAI)
>
> **{4}** One of: `dupont_2012` (default), or `lee_2019`

### Project Hooks

Additionally, hooks can be defined for projects as arrays of arrays containing the action name and the arguments (if any). For example, one can define:

```
on_processing_ready: [
  ['run_cmd', 'date > {{project}}/ALL_DONE.txt'],
  ['run_cmd', 'sendmail ...']
]
```

or

```
on_add_dataset: [
  ['run_cmd', 'echo {{object}} > {{project}}/LATEST_DATASET.txt']
]
```

Supported events:

* `on_create()`: When created
* `on_load()`: When loaded
* `on_save()`: When saved
* `on_add_dataset(object)`: When a dataset is added, with name `object`
* `on_unlink_dataset(object)`: When dataset with name `object` is unlinked
* `on_result_ready(object)`: When any result is ready, with key `object`
* `on_result_ready_{result}()`: When `result` is ready
* `on_processing_ready()`: When processing is complete

Supported hooks:

* `run_lambda(lambda, args...)`
* `run_cmd(cmd)`

## Datasets

The following metadata fields are recognized by different interfaces for **Datasets**:

### Dataset Features

Metadata with additional information and features about the dataset:

|             Field | Supported values | Description                              |
| ----------------: | :--------------: | ---------------------------------------- |
|               tax |  MiGA::Taxonomy  | Taxonomy of the dataset                  |
|           quality |      String      | Description of genome quality            |
|       trna\_count |      Integer     | Number of tRNA elements detected         |
|          trna\_aa |      Integer     | Number of distinct AA with tRNA elements |
|       dprotologue |      String      | Taxonumber in the Digital Protologue DB  |
|     ncbi\_tax\_id |      String      | Linking ID(s) {1} for NCBI Taxonomy      |
|     ncbi\_nuccore |      String      | Linking ID(s) {1} for NCBI Nucleotide    |
|         ncbi\_asm |      String      | Linking ID(s) {1} for NCBI Assembly      |
|         ebi\_embl |      String      | Linking ID(s) {1} for EBI EMBL           |
|          ebi\_ena |      String      | Linking ID(s) {1} for EBI ENA            |
|     web\_assembly |      String      | URL to download assembly                 |
| web\_assembly\_gz |      String      | URL to download gzipped assembly         |
|         see\_also |      String      | Link(s) {1} in the format text:url       |
|          is\_type |      Boolean     | If it is type material                   |
|     is\_ref\_type |      Boolean     | If it is reference material {2}          |
|         type\_rel |      String      | Relationship to type material            |
|           suspect |   Array(String)  | Flags indicating a suspect dataset       |

> **{1}** Multiple values can be provided separated by commas or colons
>
> **{2}** This is not a valid type, but it represents the closest available dataset to material that is unavailable and unlikely to ever become available. See also [Federhen, 2015, NAR](https://doi.org/10.1093/nar/gku1127)

### Dataset System Metadata

Metadata entries automatically set by MiGA:

|          Field | Supported values | Description                                                             |
| -------------: | :--------------: | ----------------------------------------------------------------------- |
|         type\* |      String      | [Type](https://manual.microbial-genomes.org/part2/types#dataset-types)  |
|            ref |      Boolean     | [Reference](https://manual.microbial-genomes.org/part2/types#reference) |
|       inactive |      Boolean     | If auto-processing should stop                                          |
| metadata\_only |      Boolean     | Dataset with metadata but without input data                            |
|         status |      String      | Proc. status: complete, incomplete, inactive                            |
|         \_step |      String      | For internal control of processing                                      |
|  \_try\_`step` |      Integer     | For internal control of processing                                      |
|       ~~user~~ |      String      | Deprecated                                                              |

> **\*** Mandatory

### Dataset Flags

Metadata entries that trigger specific behaviors in MiGA:

|                  Field | Supported values | Description                              |
| ---------------------: | :--------------: | ---------------------------------------- |
|            run\_`step` |      Boolean     | Forces running or not `step`             |
|            db\_project |       Path       | Project to use as database {1}           |
|              dist\_req |  Array of String | Run distances against these datasets {2} |
| keep\_assembly\_graphs |      Boolean     | Do not clean assembly graphs {3}         |

> **{1}** By default, it uses its own project as database. The path can be absolute or relative to the parent folder of the project
>
> **{2}** When searching best-matching datasets, include these datasets even if they are not visited using the medoid tree
>
> **{3}** By default: false, meaning that assembly graphs are removed

Any of these dataset flags can also be set as project metadata, which applies to all datasets in the project. If both a dataset and a project metadata flag are set, dataset flags take precedence. If neither is set, the default values are used.

### Dataset Hooks

Additionally, hooks can be defined for datasets as arrays of arrays containing the action name and the arguments. See above ([project hooks](#project-hooks)) for examples.

Supported events:

* `on_load()`: When loaded
* `on_save()`: When saved
* `on_remove()`: When removed
* `on_inactivate()`: When inactivated
* `on_activate()`: When activated
* `on_result_ready(object)`: When any result is ready, with key `object`
* `on_result_ready_{result}()`: When `result` is ready
* `on_preprocessing_ready()`: When preprocessing is complete

Supported hooks:

* `run_lambda(lambda, args...)`
* `clear_run_counts()`
* `run_cmd(cmd)`
