# Metadata

## All objects

All metadata objects support the following fields:

|     Field |                                Supported values                               | Description         |
| --------: | :---------------------------------------------------------------------------: | ------------------- |
| created\* | [Date](https://manual.microbial-genomes.org/master/part1/glossary#miga-dates) | Date of creation    |
| updated\* | [Date](https://manual.microbial-genomes.org/master/part1/glossary#miga-dates) | Date of last update |

> **\*** Mandatory

## Projects

The following metadata fields are recognized by different interfaces for **Projects**:

### Project Features

Metadata with additional information and features about the project:

|       Field |                                Supported values                               | Description           |
| ----------: | :---------------------------------------------------------------------------: | --------------------- |
|    comments |                                     String                                    | Free-form comments    |
| description |                                     String                                    | Free-form description |
|      name\* | [Name](https://manual.microbial-genomes.org/master/part1/glossary#miga-names) | Name‡                 |

> **\*** Mandatory

### Project System Metadata

Metadata entries automatically set by MiGA:

|      Field | Supported values | Description                                                                   |
| ---------: | :--------------: | ----------------------------------------------------------------------------- |
| datasets\* |  Array of String | List of datasets in the project                                               |
|     type\* |      String      | [Type](https://manual.microbial-genomes.org/master/part2/types#project-types) |

> **\*** Mandatory
>
> **‡** By default the base name of the project path

### Project Flags

Metadata entries that trigger specific behaviors in MiGA:

|          Field | Supported values | Description                                  |
| -------------: | :--------------: | -------------------------------------------- |
|   ref\_project |       Path       | Project with reference taxonomy              |
|  db\_proj\_dir |       Path       | Directory containing database projects       |
|    tax\_pvalue |   Float \[0,1]   | Max p-value to transfer taxonomy (def: 0.05) |
|         aai\_p |      String      | Value of aai.rb -p° on AAI (def: blast+)     |
|        haai\_p |      String      | Value of aai.rb -p° on hAAI (def: blast+)    |
|         ani\_p |      String      | Value of ani.rb -p° on ANI (def: blast+)     |
|       max\_try |      Integer     | Max number of task attempts (def: 10)        |
| aai\_save\_rbm |      Boolean     | Should RBMs be saved for OGS analysis?       |
|  ogs\_identity |  Float \[0,100]  | Min RBM identity for OGS (def: 80)           |
|     clean\_ogs |      Boolean     | If false, keeps ABC (clades only)            |
|    run\_clades |      Boolean     | Should clades be estimated from distances?   |
|       gsp\_ani |  Float \[0,100]  | ANI limit to propose gsp clades (def: 90)    |
|       gsp\_aai |  Float \[0,100]  | AAI limit to propose gsp clades (def: 95)    |
|    gsp\_metric |      String      | Metric to propose clades: `ani` (def), `aai` |
|      ess\_coll |      String      | Collection of essential genes to use+        |
|      min\_qual |  Float (or 'no') | Min. genome quality (or no filter; def: 25)  |

> **°** By default: `blast+`. Other supported values: `blast`, `blat`, `diamond` (except for ANI), and `fastani` (only for ANI), `no` (only for hAAI). If using `diamond` and/or `fastani`, the corresponding software must be installed. **Important**: These defaults will change in v1.0 to: `blast+` for hAAI, `diamond` for AAI, and `fastani` for ANI.
>
> **+** One of: `dupont_2012` (default), or `lee_2019`

### Project Hooks

Additionally, hooks can be defined for projects as arrays of arrays containing the action name and the arguments (if any). For example, one can define:

```
on_processing_ready: [
  ['run_cmd', 'date > {{project}}/ALL_DONE.txt'],
  ['run_cmd', 'sendmail ...']
]
```

or

```
on_add_dataset: [
  ['run_cmd', 'echo {{object}} > {{project}}/LATEST_DATASET.txt']
]
```

Supported events:

* `on_create()`: When created
* `on_load()`: When loaded
* `on_save()`: When saved
* `on_add_dataset(object)`: When a dataset is added, with name `object`
* `on_unlink_dataset(object)`: When dataset with name `object` is unlinked
* `on_result_ready(object)`: When any result is ready, with key `object`
* `on_result_ready_{result}()`: When `result` is ready
* `on_processing_ready()`: When processing is complete

Supported hooks:

* `run_lambda(lambda, args...)`
* `run_cmd(cmd)`

## Datasets

The following metadata fields are recognized by different interfaces for **Datasets**:

### Dataset Features

Metadata with additional information and features about the dataset:

|             Field | Supported values | Description                             |
| ----------------: | :--------------: | --------------------------------------- |
|               tax |  MiGA::Taxonomy  | Taxonomy of the dataset                 |
|           quality |      String      | Description of genome quality           |
|       dprotologue |      String      | Taxonumber in the Digital Protologue DB |
|     ncbi\_tax\_id |      String      | Linking ID(s)‡ for NCBI Taxonomy        |
|     ncbi\_nuccore |      String      | Linking ID(s)‡ for NCBI Nucleotide      |
|         ncbi\_asm |      String      | Linking ID(s)‡ for NCBI Assembly        |
|         ebi\_embl |      String      | Linking ID(s)‡ for EBI EMBL             |
|          ebi\_ena |      String      | Linking ID(s)‡ for EBI ENA              |
|     web\_assembly |      String      | URL to download assembly                |
| web\_assembly\_gz |      String      | URL to download gzipped assembly        |
|         see\_also |      String      | Link(s)‡ in the format text:url         |
|          is\_type |      Boolean     | If it is type material                  |
|     is\_ref\_type |      Boolean     | If it is reference material°            |
|         type\_rel |      String      | Relationship to type material           |
|           suspect |   Array(String)  | Flags indicating a suspect dataset      |

> **‡** Multiple values can be provided separated by commas or colons
>
> **°** This is not a valid type, but it represents the closest available dataset to material that is unavailable and unlikely to ever become available. See also [Federhen, 2015, NAR](https://doi.org/10.1093/nar/gku1127)

### Dataset System Metadata

Metadata entries automatically set by MiGA:

|          Field | Supported values | Description                                                                    |
| -------------: | :--------------: | ------------------------------------------------------------------------------ |
|         type\* |      String      | [Type](https://manual.microbial-genomes.org/master/part2/types#dataset-types)  |
|            ref |      Boolean     | [Reference](https://manual.microbial-genomes.org/master/part2/types#reference) |
|       inactive |      Boolean     | If auto-processing should stop                                                 |
| metadata\_only |      Boolean     | Dataset with metadata but without input data                                   |
|         status |      String      | Proc. status: complete, incomplete, inactive                                   |
|         \_step |      String      | For internal control of processing                                             |
|  \_try\_`step` |      Integer     | For internal control of processing                                             |
|       ~~user~~ |      String      | Deprecated                                                                     |

> **\*** Mandatory

### Dataset Flags

Metadata entries that trigger specific behaviors in MiGA:

|       Field | Supported values | Description                            |
| ----------: | :--------------: | -------------------------------------- |
| run\_`step` |      Boolean     | Forces running or not `step`           |
| db\_project |       Path       | Project to use as database             |
|   dist\_req |  Array of String | Run distances against these datasets\* |

> **\*** When searching best-matching datasets, include these datasets even if they are not visited using the medoid tree

### Dataset Hooks

Additionally, hooks can be defined for datasets as arrays of arrays containing the action name and the arguments. See above ([project hooks](#project-hooks)) for examples.

Supported events:

* `on_load()`: When loaded
* `on_save()`: When saved
* `on_remove()`: When removed
* `on_inactivate()`: When inactivated
* `on_activate()`: When activated
* `on_result_ready(object)`: When any result is ready, with key `object`
* `on_result_ready_{result}()`: When `result` is ready
* `on_preprocessing_ready()`: When preprocessing is complete

Supported hooks:

* `run_lambda(lambda, args...)`
* `clear_run_counts()`
* `run_cmd(cmd)`


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://manual.microbial-genomes.org/master/part5/metadata.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
