# Metadata

## All objects

All metadata objects support the following fields:

|     Field |                                Supported values                               | Description         |
| --------: | :---------------------------------------------------------------------------: | ------------------- |
| created\* | [Date](https://manual.microbial-genomes.org/master/part1/glossary#miga-dates) | Date of creation    |
| updated\* | [Date](https://manual.microbial-genomes.org/master/part1/glossary#miga-dates) | Date of last update |

> **\*** Mandatory

## Projects

The following metadata fields are recognized by different interfaces for **Projects**:

### Project Features

Metadata with additional information and features about the project:

|       Field |                                Supported values                               | Description           |
| ----------: | :---------------------------------------------------------------------------: | --------------------- |
|    comments |                                     String                                    | Free-form comments    |
| description |                                     String                                    | Free-form description |
|      name\* | [Name](https://manual.microbial-genomes.org/master/part1/glossary#miga-names) | Name‡                 |

> **\*** Mandatory

### Project System Metadata

Metadata entries automatically set by MiGA:

|      Field | Supported values | Description                                                                   |
| ---------: | :--------------: | ----------------------------------------------------------------------------- |
| datasets\* |  Array of String | List of datasets in the project                                               |
|     type\* |      String      | [Type](https://manual.microbial-genomes.org/master/part2/types#project-types) |

> **\*** Mandatory
>
> **‡** By default the base name of the project path

### Project Flags

Metadata entries that trigger specific behaviors in MiGA:

|          Field | Supported values | Description                                  |
| -------------: | :--------------: | -------------------------------------------- |
|   ref\_project |       Path       | Project with reference taxonomy              |
|  db\_proj\_dir |       Path       | Directory containing database projects       |
|    tax\_pvalue |   Float \[0,1]   | Max p-value to transfer taxonomy (def: 0.05) |
|         aai\_p |      String      | Value of aai.rb -p° on AAI (def: blast+)     |
|        haai\_p |      String      | Value of aai.rb -p° on hAAI (def: blast+)    |
|         ani\_p |      String      | Value of ani.rb -p° on ANI (def: blast+)     |
|       max\_try |      Integer     | Max number of task attempts (def: 10)        |
| aai\_save\_rbm |      Boolean     | Should RBMs be saved for OGS analysis?       |
|  ogs\_identity |  Float \[0,100]  | Min RBM identity for OGS (def: 80)           |
|     clean\_ogs |      Boolean     | If false, keeps ABC (clades only)            |
|    run\_clades |      Boolean     | Should clades be estimated from distances?   |
|       gsp\_ani |  Float \[0,100]  | ANI limit to propose gsp clades (def: 90)    |
|       gsp\_aai |  Float \[0,100]  | AAI limit to propose gsp clades (def: 95)    |
|    gsp\_metric |      String      | Metric to propose clades: `ani` (def), `aai` |
|      ess\_coll |      String      | Collection of essential genes to use+        |
|      min\_qual |  Float (or 'no') | Min. genome quality (or no filter; def: 25)  |

> **°** By default: `blast+`. Other supported values: `blast`, `blat`, `diamond` (except for ANI), and `fastani` (only for ANI), `no` (only for hAAI). If using `diamond` and/or `fastani`, the corresponding software must be installed. **Important**: These defaults will change in v1.0 to: `blast+` for hAAI, `diamond` for AAI, and `fastani` for ANI.
>
> **+** One of: `dupont_2012` (default), or `lee_2019`

### Project Hooks

Additionally, hooks can be defined for projects as arrays of arrays containing the action name and the arguments (if any). For example, one can define:

```
on_processing_ready: [
  ['run_cmd', 'date > {{project}}/ALL_DONE.txt'],
  ['run_cmd', 'sendmail ...']
]
```

or

```
on_add_dataset: [
  ['run_cmd', 'echo {{object}} > {{project}}/LATEST_DATASET.txt']
]
```

Supported events:

* `on_create()`: When created
* `on_load()`: When loaded
* `on_save()`: When saved
* `on_add_dataset(object)`: When a dataset is added, with name `object`
* `on_unlink_dataset(object)`: When dataset with name `object` is unlinked
* `on_result_ready(object)`: When any result is ready, with key `object`
* `on_result_ready_{result}()`: When `result` is ready
* `on_processing_ready()`: When processing is complete

Supported hooks:

* `run_lambda(lambda, args...)`
* `run_cmd(cmd)`

## Datasets

The following metadata fields are recognized by different interfaces for **Datasets**:

### Dataset Features

Metadata with additional information and features about the dataset:

|             Field | Supported values | Description                             |
| ----------------: | :--------------: | --------------------------------------- |
|               tax |  MiGA::Taxonomy  | Taxonomy of the dataset                 |
|           quality |      String      | Description of genome quality           |
|       dprotologue |      String      | Taxonumber in the Digital Protologue DB |
|     ncbi\_tax\_id |      String      | Linking ID(s)‡ for NCBI Taxonomy        |
|     ncbi\_nuccore |      String      | Linking ID(s)‡ for NCBI Nucleotide      |
|         ncbi\_asm |      String      | Linking ID(s)‡ for NCBI Assembly        |
|         ebi\_embl |      String      | Linking ID(s)‡ for EBI EMBL             |
|          ebi\_ena |      String      | Linking ID(s)‡ for EBI ENA              |
|     web\_assembly |      String      | URL to download assembly                |
| web\_assembly\_gz |      String      | URL to download gzipped assembly        |
|         see\_also |      String      | Link(s)‡ in the format text:url         |
|          is\_type |      Boolean     | If it is type material                  |
|     is\_ref\_type |      Boolean     | If it is reference material°            |
|         type\_rel |      String      | Relationship to type material           |
|           suspect |   Array(String)  | Flags indicating a suspect dataset      |

> **‡** Multiple values can be provided separated by commas or colons
>
> **°** This is not a valid type, but it represents the closest available dataset to material that is unavailable and unlikely to ever become available. See also [Federhen, 2015, NAR](https://doi.org/10.1093/nar/gku1127)

### Dataset System Metadata

Metadata entries automatically set by MiGA:

|          Field | Supported values | Description                                                                    |
| -------------: | :--------------: | ------------------------------------------------------------------------------ |
|         type\* |      String      | [Type](https://manual.microbial-genomes.org/master/part2/types#dataset-types)  |
|            ref |      Boolean     | [Reference](https://manual.microbial-genomes.org/master/part2/types#reference) |
|       inactive |      Boolean     | If auto-processing should stop                                                 |
| metadata\_only |      Boolean     | Dataset with metadata but without input data                                   |
|         status |      String      | Proc. status: complete, incomplete, inactive                                   |
|         \_step |      String      | For internal control of processing                                             |
|  \_try\_`step` |      Integer     | For internal control of processing                                             |
|       ~~user~~ |      String      | Deprecated                                                                     |

> **\*** Mandatory

### Dataset Flags

Metadata entries that trigger specific behaviors in MiGA:

|       Field | Supported values | Description                            |
| ----------: | :--------------: | -------------------------------------- |
| run\_`step` |      Boolean     | Forces running or not `step`           |
| db\_project |       Path       | Project to use as database             |
|   dist\_req |  Array of String | Run distances against these datasets\* |

> **\*** When searching best-matching datasets, include these datasets even if they are not visited using the medoid tree

### Dataset Hooks

Additionally, hooks can be defined for datasets as arrays of arrays containing the action name and the arguments. See above ([project hooks](#project-hooks)) for examples.

Supported events:

* `on_load()`: When loaded
* `on_save()`: When saved
* `on_remove()`: When removed
* `on_inactivate()`: When inactivated
* `on_activate()`: When activated
* `on_result_ready(object)`: When any result is ready, with key `object`
* `on_result_ready_{result}()`: When `result` is ready
* `on_preprocessing_ready()`: When preprocessing is complete

Supported hooks:

* `run_lambda(lambda, args...)`
* `clear_run_counts()`
* `run_cmd(cmd)`
