> For the complete documentation index, see [llms.txt](https://manual.microbial-genomes.org/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://manual.microbial-genomes.org/part5/metadata.md).

# Metadata

## All objects

All metadata objects support the following fields:

|     Field |            Supported values           | Description         |
| --------: | :-----------------------------------: | ------------------- |
| created\* | [Date](/part1/glossary.md#miga-dates) | Date of creation    |
| updated\* | [Date](/part1/glossary.md#miga-dates) | Date of last update |

> **\*** Mandatory

## Projects

The following metadata fields are recognized by different interfaces for **Projects**:

### Project Features

Metadata with additional information and features about the project:

|       Field |            Supported values           | Description           |
| ----------: | :-----------------------------------: | --------------------- |
|    comments |                 String                | Free-form comments    |
| description |                 String                | Free-form description |
|      name\* | [Name](/part1/glossary.md#miga-names) | Name‡                 |

> **\*** Mandatory

### Project System Metadata

Metadata entries automatically set by MiGA:

|      Field | Supported values | Description                           |
| ---------: | :--------------: | ------------------------------------- |
| datasets\* |  Array of String | List of datasets in the project       |
|     type\* |      String      | [Type](/part2/types.md#project-types) |

> **\*** Mandatory
>
> **‡** By default the base name of the project path

### Project Flags

Metadata entries that trigger specific behaviors in MiGA:

|                 Field | Supported values | Description                                    |
| --------------------: | :--------------: | ---------------------------------------------- |
|          ref\_project |       Path       | Project with reference taxonomy {1}            |
|         db\_proj\_dir |       Path       | Directory containing database projects {1} {2} |
|           tax\_pvalue |   Float \[0,1]   | Max p-value to transfer taxonomy (def: 0.1)    |
|               haai\_p |      String      | hAAI engine {3} (def: fastaai)                 |
|                aai\_p |      String      | AAI engine {3} (def: diamond)                  |
|                ani\_p |      String      | ANI engine {3} (def: fastani)                  |
|              max\_try |      Integer     | Max number of task attempts (def: 10)          |
|        aai\_save\_rbm |      Boolean     | Should RBMs be saved for OGS analysis?         |
|         ogs\_identity |  Float \[0,100]  | Min RBM identity for OGS (def: 80)             |
|            clean\_ogs |      Boolean     | If false, keeps ABC (clades only)              |
|           run\_clades |      Boolean     | Should clades be estimated from distances?     |
|              run\_ogs |      Boolean     | Should orthologous groups be estimated?        |
|              gsp\_ani |  Float \[0,100]  | ANI limit to propose gsp clades (def: 95)      |
|              gsp\_aai |  Float \[0,100]  | AAI limit to propose gsp clades (def: 90)      |
|           gsp\_metric |      String      | Metric to propose clades: `ani` (def), `aai`   |
|             ess\_coll |      String      | Collection of essential genes to use {4}       |
|             min\_qual |  Float (or 'no') | Min. genome quality (or no filter; def: 25)    |
| distances\_checkpoint |      Integer     | Comparisons before storing data (def: 10)      |

> **{1}** This path can either be absolute or relative to the project's path
>
> **{2}** This is the location of the databases used by [db\_project](#dataset-flags). If not set, it is assumed to be the parent folder of the current project
>
> **{3}** Supported values: `blast`, `blat`, `diamond` (only for hAAI and AAI), `fastani` (only for ANI), `no` (only for hAAI and AAI), and `fastaai` (only for hAAI)
>
> **{4}** One of: `dupont_2012` (default), or `lee_2019`

### Project Hooks

Additionally, hooks can be defined for projects as arrays of arrays containing the action name and the arguments (if any). For example, one can define:

```
on_processing_ready: [
  ['run_cmd', 'date > {{project}}/ALL_DONE.txt'],
  ['run_cmd', 'sendmail ...']
]
```

or

```
on_add_dataset: [
  ['run_cmd', 'echo {{object}} > {{project}}/LATEST_DATASET.txt']
]
```

Supported events:

* `on_create()`: When created
* `on_load()`: When loaded
* `on_save()`: When saved
* `on_add_dataset(object)`: When a dataset is added, with name `object`
* `on_unlink_dataset(object)`: When dataset with name `object` is unlinked
* `on_result_ready(object)`: When any result is ready, with key `object`
* `on_result_ready_{result}()`: When `result` is ready
* `on_processing_ready()`: When processing is complete

Supported hooks:

* `run_lambda(lambda, args...)`
* `run_cmd(cmd)`

## Datasets

The following metadata fields are recognized by different interfaces for **Datasets**:

### Dataset Features

Metadata with additional information and features about the dataset:

|             Field | Supported values | Description                              |
| ----------------: | :--------------: | ---------------------------------------- |
|               tax |  MiGA::Taxonomy  | Taxonomy of the dataset                  |
|           quality |      String      | Description of genome quality            |
|       trna\_count |      Integer     | Number of tRNA elements detected         |
|          trna\_aa |      Integer     | Number of distinct AA with tRNA elements |
|       dprotologue |      String      | Taxonumber in the Digital Protologue DB  |
|     ncbi\_tax\_id |      String      | Linking ID(s) {1} for NCBI Taxonomy      |
|     ncbi\_nuccore |      String      | Linking ID(s) {1} for NCBI Nucleotide    |
|         ncbi\_asm |      String      | Linking ID(s) {1} for NCBI Assembly      |
|         ebi\_embl |      String      | Linking ID(s) {1} for EBI EMBL           |
|          ebi\_ena |      String      | Linking ID(s) {1} for EBI ENA            |
|     web\_assembly |      String      | URL to download assembly                 |
| web\_assembly\_gz |      String      | URL to download gzipped assembly         |
|         see\_also |      String      | Link(s) {1} in the format text:url       |
|          is\_type |      Boolean     | If it is type material                   |
|     is\_ref\_type |      Boolean     | If it is reference material {2}          |
|         type\_rel |      String      | Relationship to type material            |
|           suspect |   Array(String)  | Flags indicating a suspect dataset       |

> **{1}** Multiple values can be provided separated by commas or colons
>
> **{2}** This is not a valid type, but it represents the closest available dataset to material that is unavailable and unlikely to ever become available. See also [Federhen, 2015, NAR](https://doi.org/10.1093/nar/gku1127)

### Dataset System Metadata

Metadata entries automatically set by MiGA:

|          Field | Supported values | Description                                  |
| -------------: | :--------------: | -------------------------------------------- |
|         type\* |      String      | [Type](/part2/types.md#dataset-types)        |
|            ref |      Boolean     | [Reference](/part2/types.md#reference)       |
|       inactive |      Boolean     | If auto-processing should stop               |
| metadata\_only |      Boolean     | Dataset with metadata but without input data |
|         status |      String      | Proc. status: complete, incomplete, inactive |
|         \_step |      String      | For internal control of processing           |
|  \_try\_`step` |      Integer     | For internal control of processing           |
|       ~~user~~ |      String      | Deprecated                                   |

> **\*** Mandatory

### Dataset Flags

Metadata entries that trigger specific behaviors in MiGA:

|                  Field | Supported values | Description                              |
| ---------------------: | :--------------: | ---------------------------------------- |
|            run\_`step` |      Boolean     | Forces running or not `step`             |
|            db\_project |       Path       | Project to use as database {1}           |
|              dist\_req |  Array of String | Run distances against these datasets {2} |
| keep\_assembly\_graphs |      Boolean     | Do not clean assembly graphs {3}         |

> **{1}** By default, it uses its own project as database. The path can be absolute or relative to the parent folder of the project
>
> **{2}** When searching best-matching datasets, include these datasets even if they are not visited using the medoid tree
>
> **{3}** By default: false, meaning that assembly graphs are removed

Any of these dataset flags can also be set as project metadata, which applies to all datasets in the project. If both a dataset and a project metadata flag are set, dataset flags take precedence. If neither is set, the default values are used.

### Dataset Hooks

Additionally, hooks can be defined for datasets as arrays of arrays containing the action name and the arguments. See above ([project hooks](#project-hooks)) for examples.

Supported events:

* `on_load()`: When loaded
* `on_save()`: When saved
* `on_remove()`: When removed
* `on_inactivate()`: When inactivated
* `on_activate()`: When activated
* `on_result_ready(object)`: When any result is ready, with key `object`
* `on_result_ready_{result}()`: When `result` is ready
* `on_preprocessing_ready()`: When preprocessing is complete

Supported hooks:

* `run_lambda(lambda, args...)`
* `clear_run_counts()`
* `run_cmd(cmd)`


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://manual.microbial-genomes.org/part5/metadata.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
