`dataio.scripts.sync_dataset_documentation`¶

Sync dataset documentation (README.md and metadata.json) from S3 file server to database.

This script fetches README.md and metadata.json files from the S3 filestore and caches their contents in the datasets table for faster access.

Usage: # Sync all datasets uv run python -m dataio.scripts.sync_dataset_documentation

# Sync specific dataset
uv run python -m dataio.scripts.sync_dataset_documentation --dataset DS_EXAMPLE01

# Dry run (show what would be synced)
uv run python -m dataio.scripts.sync_dataset_documentation --dry-run

Module Contents¶

Functions¶

`get_database_url`	Build database URL from environment variables.
`get_s3_client`	Initialize S3 client.
`fetch_file_from_s3`	Fetch a file from S3 for a dataset.
`sync_dataset_documentation`	Sync documentation for a single dataset.
`main`

Data¶

logger

API¶

dataio.scripts.sync_dataset_documentation.logger¶: ‘getLogger(…)’

dataio.scripts.sync_dataset_documentation.get_database_url() → str[source]¶: Build database URL from environment variables.

dataio.scripts.sync_dataset_documentation.get_s3_client()[source]¶: Initialize S3 client.

dataio.scripts.sync_dataset_documentation.fetch_file_from_s3(bucket, dataset_id: str, filename: str) → Optional[str][source]¶

Fetch a file from S3 for a dataset.

Looks in both STANDARDISED and PREPROCESSED versions. Returns the file content as string, or None if not found.

dataio.scripts.sync_dataset_documentation.sync_dataset_documentation(db_session, bucket, dataset_id: str, dry_run: bool = False) → dict[source]¶

Sync documentation for a single dataset.

Returns dict with sync results.

dataio.scripts.sync_dataset_documentation.main()[source]¶

dataio.scripts.sync_dataset_documentation¶

Module Contents¶

Functions¶

Data¶

API¶

`dataio.scripts.sync_dataset_documentation`¶