# Hdf5 data format

This chapter describes the data format for the Hdf5 dataset format for Trident. The format is ordinarily read and written by the `Hdf5DataStore` component.

## Layout

The root level of the file contains one group per type of data object. Inside each group are an arbitrary hierarchy of groups, with the innermost group representing one data resource. The types of datasets and attributes are found depend on the type of data resource. The following table shows the relationship with the topmost group name and the data resource type.

| Group name    | DataResourceType |
| ------------- | ---------------- |
| `ts`          | `TIME_SERIES`    |
| `xy`          | `XY_CURVE`       |
| `xyts`        | `XY_TIME_SERIES` |
| `ndarrayts`   | `ND_ARRAY_TS`    |
| `string_list` | `STRING_LIST`    |

Each data resource must have a path that corresponds to a known resource path in Trident. Those paths can be found under API Reference > Python API > Modules.

## Data resource types

### `TIME_SERIES` data resource

This represents a `TimeSeries` object.

- `data` dataset

  A composite table with two members:

  - `t: i64` - Microseconds since the UNIX epoch
  - `v: f64` - Value

- `interpolation` attribute

  This is an integer value which represents the time series interpolation type. Possible values are:

  - `INSTANT = 0` - Values continue until the next point (stepwise curve)
  - `LINEAR = 1` - Values between points are linearly interpolated

### `XY_CURVE` data resource

This represents an `XYCurve` object.

- `x` and `y` datasets

  This represents the combined `x` and `y` coordinates for all the XY Curves. Note that values for all curves are combined into a 1-dimensional dataset. The number of values per curve are found in the `s` dataset, which is needed when interpreting the data.

- `interpolation` attribute

  This is an integer value which represents the time series interpolation type. Possible values are:

  - `INSTANT = 0` - Values continue until the next point (stepwise curve)
  - `LINEAR = 1` - Values between points are linearly interpolated

### `XY_TIME_SERIES` data resource

This represents an `XYTimeSeries` object.

- `t` dataset

  A 1-dimensional dataset of integers representing microseconds since the UNIX epoch.

- `s` dataset

  A 1-dimensional dataset with integers representing the number of points per curve. When converting the data to a series of curves, these values must be used to extract the `x` and `y` values.

- `i` dataset

  A 1-dimensional data set of ints, which represent interpolation type for each curve. Possible values are:

  - `INSTANT = 0` - Values continue until the next point (stepwise curve)
  - `LINEAR = 1` - Values between points are linearly interpolated

- `x` and `y` datasets

  This represents the combined `x` and `y` coordinates for all the XY Curves. Note that values for all curves are combined into a 1-dimensional dataset. The number of values per curve are found in the `s` dataset, which is needed when interpreting the data.

### `ND_ARRAY_TS` data resource

This represents an `NDArrayTS` object; that is, a time series of N-dimensional arrays.

- `t` dataset

  A 1-dimensional dataset of integers representing microseconds since the UNIX epoch.

- `s` dataset

  A 1-dimensional strided array of size information. For each time series element, the following values are included:

  - `ndims`: The number of dimensions (1 value)
  - `dims`: The dimension sizes (`ndims` values)

- `v` dataset

  A 1-dimensional strided dataset containing all the values. They are packed after eachother, and each NDArray's values can be located by interpretin the `s` dataset.

### `STRING_LIST` data resource

This represents a `std::vector<std::string>` object in C++ and `list[str]` object in Python.

- `v` dataset

  A 1-dimensional dataset of C-string values.