Dataset Manifests¶
DataIO uses a dataset manifest as the canonical document for dataset structure, metadata, and validation.
A manifest is broader than a schema. It can include:
dataset-level metadata
table or file inventory
field definitions and constraints
shared enums
dataset kind and spec hints
validation-relevant context such as temporal and region semantics
Core Vocabulary¶
Manifest: the canonical YAML document for a dataset
Validation: checking a manifest and data files against the metadata contract and type rules
Filestore: the persisted object storage copy of dataset files and dataset manifests
DB cache: the platform copy used for fast reads in APIs and UI
Validation Entry Points¶
CLI:
dataio validate tabular --manifest manifest.yaml --table livestock=livestock.csv
dataio validate geojson --manifest geojson-manifest.yaml --data districts.geojson
Python SDK:
from dataio.sdk.validate import DataIOValidator
validator = DataIOValidator()
result = validator.validate_tabular(
manifest="manifest.yaml",
data_files={"livestock": "livestock.csv"},
)
print(result.status)
API:
POST /api/v1/validatePOST /api/v1/validate/tabularPOST /api/v1/validate/geojson