SDK API Guide¶

Complete reference for the DataIOAPI client class.

DataIOAPI Class¶

from dataio import DataIOAPI

The main client class for interacting with the DataIO API.

Constructor¶

`DataIOAPI(base_url=None, api_key=None, data_dir=None)`¶

Initialize a new DataIO API client.

Parameters:

base_url (str, optional): The base URL of the DataIO API. If not provided, uses the DATAIO_API_BASE_URL environment variable.
api_key (str, optional): The API key for authentication. If not provided, uses the DATAIO_API_KEY environment variable.
data_dir (str, optional): The directory to download the data to. If not provided, uses the DATAIO_DATA_DIR environment variable.

Raises:

ValueError: If neither environment variables nor parameters are provided for base_url or api_key.

Example:

# Using environment variables
client = DataIOAPI()

# Passing credentials directly
client = DataIOAPI(
    base_url="https://dataio.artpark.ai/api/v1",
    api_key="your_api_key",
    data_dir="data"
)

Dataset Methods¶

`list_datasets(limit=None)`¶

Get a list of all datasets available to the authenticated user.

Parameters:

limit (int, optional): Maximum number of datasets to return. Defaults to 100 if not specified.

Returns:

list: List of dataset dictionaries containing metadata for each dataset.

Example:

# Get all datasets (up to 100)
datasets = client.list_datasets()

# Get first 10 datasets
datasets = client.list_datasets(limit=10)

# Each dataset contains:
# - ds_id: Unique dataset identifier
# - title: Dataset title
# - description: Dataset description
# - tags: List of tag dictionaries with 'id' and 'tag_name'
# - collection: Collection information

`get_dataset_details(dataset_id)`¶

Get detailed metadata for a specific dataset.

Parameters:

dataset_id (str or int): The dataset ID. Can be the full ds_id or just the numeric part.

Returns:

dict: Complete dataset metadata including title, description, collection, and other fields.

Raises:

ValueError: If the dataset with the specified ID is not found.

Example:

# Using full dataset ID
details = client.get_dataset_details("TS0001DS9999")

# Using just the numeric part (will be zero-padded)
details = client.get_dataset_details("9999")
details = client.get_dataset_details(9999)

`list_dataset_tables(dataset_id, bucket_type="STANDARDISED")`¶

Get a list of tables within a dataset, including download links.

Parameters:

dataset_id (str): The dataset ID to get tables for.
bucket_type (str, optional): Type of bucket. Either “STANDARDISED” or “PREPROCESSED”. Defaults to “STANDARDISED”.

Note

Currently, only “STANDARDISED” datasets are available. “PREPROCESSED” datasets are not yet accessible through the API.

Returns:

list: List of table dictionaries, each containing:
- table_name: Name of the table
- download_link: Signed URL for downloading (expires in 1 hour)
- metadata: Table-level metadata

Example:

# Get tables for a dataset
tables = client.list_dataset_tables("TS0001DS9999")

# Get preprocessed tables
tables = client.list_dataset_tables("TS0001DS9999", bucket_type="PREPROCESSED")

for table in tables:
    print(f"Table: {table['table_name']}")
    print(f"Download: {table['download_link']}")

`download_dataset(dataset_id, **kwargs)`¶

Download a complete dataset with all its tables and metadata.

Parameters:

dataset_id (str): The dataset ID to download.
bucket_type (str, optional): Bucket type to download. Defaults to “STANDARDISED”.
root_dir (str, optional): Root directory for downloads. Defaults to “data”.
get_metadata (bool, optional): Whether to download metadata file. Defaults to True.
metadata_format (str, optional): Format for metadata (“yaml” or “json”). Defaults to “yaml”.
update_sync_history (bool, optional): Whether to update sync history. Defaults to True.
sync_history_file (str, optional): Name of sync history file. Defaults to “sync-history.yaml”.

Returns:

str: Path to the downloaded dataset directory.

Example:

# Basic download
path = client.download_dataset("TS0001DS9999")

# Download to custom directory with JSON metadata
path = client.download_dataset(
    "TS0001DS9999",
    root_dir="my_datasets",
    metadata_format="json"
)

# Download without metadata
path = client.download_dataset(
    "TS0001DS9999",
    get_metadata=False
)

Directory Structure:

root_dir/
├── sync-history.yaml  (if update_sync_history=True)
└── TS0001DS9999-Dataset_Title/
    ├── table1.csv
    ├── table2.csv
    ├── table3.csv
    └── metadata.yaml  (if get_metadata=True)

`construct_dataset_metadata(dataset_details, bucket_type="STANDARDISED")`¶

Build comprehensive metadata combining dataset and table-level information.

Parameters:

dataset_details (dict): Dataset details from get_dataset_details().
bucket_type (str, optional): Bucket type for table metadata. Defaults to “STANDARDISED”.

Returns:

dict: Combined metadata with dataset and table information.

Required fields in dataset_details:

title: Dataset title
description: Dataset description
collection: Collection object with category_name and collection_name

Example:

dataset_details = client.get_dataset_details("TS0001DS9999")
metadata = client.construct_dataset_metadata(dataset_details)

# Metadata structure:
# - dataset_title: Title of the dataset
# - dataset_description: Description
# - category: Category name
# - collection: Collection name
# - dataset_tables: Dict of table metadata keyed by table name

Region and Shapefile Methods¶

`get_children_regions(region_id)`¶

Get all direct children regions for a given parent region.

Parameters:

region_id (str): The ID of the parent region to get children for.

Returns:

list: List of region dictionaries containing metadata for each child region.

Example:

# Get children of a state region
children = client.get_children_regions("state_29")

for child in children:
    print(f"Region ID: {child['region_id']}")
    print(f"Name: {child['region_name']}")
    print(f"Parent: {child['parent_region_id']}")

API Endpoint: GET /api/v1/regions/{region_id}/children

`get_shapefile_list()`¶

Get a list of all available shapefiles.

Returns:

list: List of shapefile dictionaries containing metadata for each shapefile.

Example:

shapefiles = client.get_shapefile_list()

for shapefile in shapefiles:
    print(f"Region ID: {shapefile['region_id']}")
    print(f"Name: {shapefile['region_name']}")

`download_shapefile(region_id, shp_folder="data/GS0012DS0051-Shapefiles_India")`¶

Download a shapefile for a specific region.

Parameters:

region_id (str): ID of the region to download shapefile for.
shp_folder (str, optional): Directory to save the shapefile. Defaults to “{data_dir}/GS0012DS0051-Shapefiles_India”, where data_dir is derived from the API client.

Returns:

str: Path to the downloaded GeoJSON file.

Raises:

ValueError: If shapefile for the specified region is not found.

Example:

# Download shapefile for a state
path = client.download_shapefile("state_29")

# Download to custom folder
path = client.download_shapefile(
    "state_29",
    shp_folder="my_shapefiles"
)

Note: Shapefiles are downloaded as GeoJSON format, not traditional .shp files.

Weather Data Methods¶

`list_weather_datasets()`¶

Get a list of all available weather datasets with metadata.

Returns:

list: List of weather dataset dictionaries, each containing:
- dataset_name: Name of the weather dataset (e.g., “era5_sfc”)
- variables: List of variable metadata dictionaries
- temporal_coverage_start: Start date of available data
- temporal_coverage_end: End date of available data
- spatial_bounds: Dictionary with min_lat, max_lat, min_lon, max_lon

Example:

# List all weather datasets
datasets = client.list_weather_datasets()

for dataset in datasets:
    print(f"Dataset: {dataset['dataset_name']}")
    print(f"Time range: {dataset['temporal_coverage_start']} to {dataset['temporal_coverage_end']}")
    print(f"Variables: {len(dataset['variables'])}")

    for var in dataset['variables']:
        print(f"  - {var['name']}: {var['long_name']} ({var['units']})")
        print(f"    Resolution: {var['spatial_resolution']} (spatial), {var['temporal_resolution']} (temporal)")

API Endpoint: GET /api/v1/weather/datasets

`download_weather_data(dataset_name, variables, start_date, end_date, geojson, output_dir=None)`¶

Download weather data with spatial and temporal filtering.

Parameters:

dataset_name (str): Name of the weather dataset (e.g., “era5_sfc”).
variables (List[str]): List of variables to extract (e.g., [“t2m”, “d2m”, “tp”]).
start_date (str): Start date in ISO format (YYYY-MM-DD or YYYY-MM-DDTHH:MM:SS).
end_date (str): End date in ISO format (YYYY-MM-DD or YYYY-MM-DDTHH:MM:SS).
geojson (Union[str, Dict]): GeoJSON for spatial filtering. Can be:
- Python dict with GeoJSON structure
- Path to a .geojson file (will be loaded automatically)
- Region ID string (will fetch shapefile from API automatically)
output_dir (str, optional): Directory to save the NetCDF file. Defaults to {data_dir}/weather/{dataset_name}.

Returns:

xarray.Dataset: The weather data as an xarray Dataset object (if xarray is installed).
str: Path to the saved NetCDF file (if xarray is not installed).

Example:

# Download using a region ID (fetches shapefile automatically)
ds = client.download_weather_data(
    dataset_name="era5_sfc",
    variables=["t2m", "d2m"],
    start_date="2024-01-01",
    end_date="2024-01-31",
    geojson="state_29"  # Karnataka region ID
)

# Download using a geojson file path
ds = client.download_weather_data(
    dataset_name="era5_sfc",
    variables=["tp"],
    start_date="2024-06-01",
    end_date="2024-06-07",
    geojson="path/to/region.geojson"
)

# Download using a geojson dict
bbox_geojson = {
    "type": "Feature",
    "properties": {"region_id": "custom_bbox"},
    "geometry": {
        "type": "Polygon",
        "coordinates": [[
            [70, 10], [80, 10], [80, 20], [70, 20], [70, 10]
        ]]
    }
}

ds = client.download_weather_data(
    dataset_name="era5_sfc",
    variables=["t2m"],
    start_date="2024-01-01",
    end_date="2024-01-02",
    geojson=bbox_geojson,
    output_dir="./my_weather_data"
)

# Work with the xarray Dataset
print(ds)
print(f"Dimensions: {dict(ds.dims)}")
print(f"Variables: {list(ds.data_vars)}")

# Access data
temperature = ds['t2m']
mean_temp = temperature.mean()
print(f"Mean temperature: {mean_temp.values} K")

Output File Naming:

Files are saved with descriptive names in the format:

{dataset_name}_{variable1}_{variable2}_{YYYYMMDD_start}_{YYYYMMDD_end}_{region_id}.nc

For example:

era5_sfc_t2m_d2m_20240101_20240131_state_29.nc
era5_sfc_tp_20240601_20240607.nc

API Endpoint: POST /api/v1/weather/datasets/{dataset_name}/download

Requirements:

Weather data access requires DOWNLOAD permission for the WEATHER_DATA_API resource type.
Contact your DataIO administrator to grant weather data access permissions.

Error Handling¶

The DataIO API client raises standard Python exceptions:

ValueError: For invalid parameters or missing data
requests.HTTPError: For HTTP-related errors (authentication, not found, etc.)
requests.ConnectionError: For network connectivity issues

Example:

try:
    datasets = client.list_datasets()
except requests.HTTPError as e:
    if e.response.status_code == 401:
        print("Authentication failed - check your API key")
    elif e.response.status_code == 403:
        print("Access forbidden - insufficient permissions")
    else:
        print(f"HTTP error: {e}")
except ValueError as e:
    print(f"Invalid parameter: {e}")
except Exception as e:
    print(f"Unexpected error: {e}")

Environment Variables¶

The client uses these environment variables:

DATAIO_API_BASE_URL: Base URL for the DataIO API
DATAIO_API_KEY: API key for authentication
DATAIO_DATA_DIR: Directory to download the data to.

Set these in a .env file:

DATAIO_API_BASE_URL=https://dataio.artpark.ai/api/v1
DATAIO_API_KEY=your_api_key_here
DATAIO_DATA_DIR=data

SDK API Guide¶

DataIOAPI Class¶

Constructor¶

DataIOAPI(base_url=None, api_key=None, data_dir=None)¶

Dataset Methods¶

list_datasets(limit=None)¶

get_dataset_details(dataset_id)¶

list_dataset_tables(dataset_id, bucket_type="STANDARDISED")¶

download_dataset(dataset_id, **kwargs)¶

construct_dataset_metadata(dataset_details, bucket_type="STANDARDISED")¶

Region and Shapefile Methods¶

get_children_regions(region_id)¶

get_shapefile_list()¶

download_shapefile(region_id, shp_folder="data/GS0012DS0051-Shapefiles_India")¶

Weather Data Methods¶

list_weather_datasets()¶

download_weather_data(dataset_name, variables, start_date, end_date, geojson, output_dir=None)¶

Error Handling¶

Environment Variables¶

`DataIOAPI(base_url=None, api_key=None, data_dir=None)`¶

`list_datasets(limit=None)`¶

`get_dataset_details(dataset_id)`¶

`list_dataset_tables(dataset_id, bucket_type="STANDARDISED")`¶

`download_dataset(dataset_id, **kwargs)`¶

`construct_dataset_metadata(dataset_details, bucket_type="STANDARDISED")`¶

`get_children_regions(region_id)`¶

`get_shapefile_list()`¶

`download_shapefile(region_id, shp_folder="data/GS0012DS0051-Shapefiles_India")`¶

`list_weather_datasets()`¶

`download_weather_data(dataset_name, variables, start_date, end_date, geojson, output_dir=None)`¶