SDK API Guide¶
Complete reference for the DataIOAPI client class.
DataIOAPI Class¶
from dataio import DataIOAPI
The main client class for interacting with the DataIO API.
Constructor¶
DataIOAPI(base_url=None, api_key=None, data_dir=None)¶
Initialize a new DataIO API client.
Parameters:
base_url(str, optional): The base URL of the DataIO API. If not provided, uses theDATAIO_API_BASE_URLenvironment variable.api_key(str, optional): The API key for authentication. If not provided, uses theDATAIO_API_KEYenvironment variable.data_dir(str, optional): The directory to download the data to. If not provided, uses theDATAIO_DATA_DIRenvironment variable.
Raises:
ValueError: If neither environment variables nor parameters are provided for base_url or api_key.
Example:
# Using environment variables
client = DataIOAPI()
# Passing credentials directly
client = DataIOAPI(
base_url="https://dataio.artpark.ai/api/v1",
api_key="your_api_key",
data_dir="data"
)
Dataset Methods¶
list_datasets(limit=None)¶
Get a list of all datasets available to the authenticated user.
Parameters:
limit(int, optional): Maximum number of datasets to return. Defaults to 100 if not specified.
Returns:
list: List of dataset dictionaries containing metadata for each dataset.
Example:
# Get all datasets (up to 100)
datasets = client.list_datasets()
# Get first 10 datasets
datasets = client.list_datasets(limit=10)
# Each dataset contains:
# - ds_id: Unique dataset identifier
# - title: Dataset title
# - description: Dataset description
# - tags: List of tag dictionaries with 'id' and 'tag_name'
# - collection: Collection information
get_dataset_details(dataset_id)¶
Get detailed metadata for a specific dataset.
Parameters:
dataset_id(str or int): The dataset ID. Can be the full ds_id or just the numeric part.
Returns:
dict: Complete dataset metadata including title, description, collection, and other fields.
Raises:
ValueError: If the dataset with the specified ID is not found.
Example:
# Using full dataset ID
details = client.get_dataset_details("TS0001DS9999")
# Using just the numeric part (will be zero-padded)
details = client.get_dataset_details("9999")
details = client.get_dataset_details(9999)
list_dataset_tables(dataset_id, bucket_type="STANDARDISED")¶
Get a list of tables within a dataset, including download links.
Parameters:
dataset_id(str): The dataset ID to get tables for.bucket_type(str, optional): Type of bucket. Either “STANDARDISED” or “PREPROCESSED”. Defaults to “STANDARDISED”.
Note
Currently, only “STANDARDISED” datasets are available. “PREPROCESSED” datasets are not yet accessible through the API.
Returns:
list: List of table dictionaries, each containing:table_name: Name of the tabledownload_link: Signed URL for downloading (expires in 1 hour)metadata: Table-level metadata
Example:
# Get tables for a dataset
tables = client.list_dataset_tables("TS0001DS9999")
# Get preprocessed tables
tables = client.list_dataset_tables("TS0001DS9999", bucket_type="PREPROCESSED")
for table in tables:
print(f"Table: {table['table_name']}")
print(f"Download: {table['download_link']}")
download_dataset(dataset_id, **kwargs)¶
Download a complete dataset with all its tables and metadata.
Parameters:
dataset_id(str): The dataset ID to download.bucket_type(str, optional): Bucket type to download. Defaults to “STANDARDISED”.root_dir(str, optional): Root directory for downloads. Defaults to “data”.get_metadata(bool, optional): Whether to download metadata file. Defaults to True.metadata_format(str, optional): Format for metadata (“yaml” or “json”). Defaults to “yaml”.update_sync_history(bool, optional): Whether to update sync history. Defaults to True.sync_history_file(str, optional): Name of sync history file. Defaults to “sync-history.yaml”.
Returns:
str: Path to the downloaded dataset directory.
Example:
# Basic download
path = client.download_dataset("TS0001DS9999")
# Download to custom directory with JSON metadata
path = client.download_dataset(
"TS0001DS9999",
root_dir="my_datasets",
metadata_format="json"
)
# Download without metadata
path = client.download_dataset(
"TS0001DS9999",
get_metadata=False
)
Directory Structure:
root_dir/
├── sync-history.yaml (if update_sync_history=True)
└── TS0001DS9999-Dataset_Title/
├── table1.csv
├── table2.csv
├── table3.csv
└── metadata.yaml (if get_metadata=True)
construct_dataset_metadata(dataset_details, bucket_type="STANDARDISED")¶
Build comprehensive metadata combining dataset and table-level information.
Parameters:
dataset_details(dict): Dataset details fromget_dataset_details().bucket_type(str, optional): Bucket type for table metadata. Defaults to “STANDARDISED”.
Returns:
dict: Combined metadata with dataset and table information.
Required fields in dataset_details:
title: Dataset titledescription: Dataset descriptioncollection: Collection object withcategory_nameandcollection_name
Example:
dataset_details = client.get_dataset_details("TS0001DS9999")
metadata = client.construct_dataset_metadata(dataset_details)
# Metadata structure:
# - dataset_title: Title of the dataset
# - dataset_description: Description
# - category: Category name
# - collection: Collection name
# - dataset_tables: Dict of table metadata keyed by table name
Region and Shapefile Methods¶
get_children_regions(region_id)¶
Get all direct children regions for a given parent region.
Parameters:
region_id(str): The ID of the parent region to get children for.
Returns:
list: List of region dictionaries containing metadata for each child region.
Example:
# Get children of a state region
children = client.get_children_regions("state_29")
for child in children:
print(f"Region ID: {child['region_id']}")
print(f"Name: {child['region_name']}")
print(f"Parent: {child['parent_region_id']}")
API Endpoint: GET /api/v1/regions/{region_id}/children
get_shapefile_list()¶
Get a list of all available shapefiles.
Returns:
list: List of shapefile dictionaries containing metadata for each shapefile.
Example:
shapefiles = client.get_shapefile_list()
for shapefile in shapefiles:
print(f"Region ID: {shapefile['region_id']}")
print(f"Name: {shapefile['region_name']}")
download_shapefile(region_id, shp_folder="data/GS0012DS0051-Shapefiles_India")¶
Download a shapefile for a specific region.
Parameters:
region_id(str): ID of the region to download shapefile for.shp_folder(str, optional): Directory to save the shapefile. Defaults to “{data_dir}/GS0012DS0051-Shapefiles_India”, where data_dir is derived from the API client.
Returns:
str: Path to the downloaded GeoJSON file.
Raises:
ValueError: If shapefile for the specified region is not found.
Example:
# Download shapefile for a state
path = client.download_shapefile("state_29")
# Download to custom folder
path = client.download_shapefile(
"state_29",
shp_folder="my_shapefiles"
)
Note: Shapefiles are downloaded as GeoJSON format, not traditional .shp files.
Weather Data Methods¶
list_weather_datasets()¶
Get a list of all available weather datasets with metadata.
Returns:
list: List of weather dataset dictionaries, each containing:dataset_name: Name of the weather dataset (e.g., “era5_sfc”)variables: List of variable metadata dictionariestemporal_coverage_start: Start date of available datatemporal_coverage_end: End date of available dataspatial_bounds: Dictionary with min_lat, max_lat, min_lon, max_lon
Example:
# List all weather datasets
datasets = client.list_weather_datasets()
for dataset in datasets:
print(f"Dataset: {dataset['dataset_name']}")
print(f"Time range: {dataset['temporal_coverage_start']} to {dataset['temporal_coverage_end']}")
print(f"Variables: {len(dataset['variables'])}")
for var in dataset['variables']:
print(f" - {var['name']}: {var['long_name']} ({var['units']})")
print(f" Resolution: {var['spatial_resolution']} (spatial), {var['temporal_resolution']} (temporal)")
API Endpoint: GET /api/v1/weather/datasets
download_weather_data(dataset_name, variables, start_date, end_date, geojson, output_dir=None)¶
Download weather data with spatial and temporal filtering.
Parameters:
dataset_name(str): Name of the weather dataset (e.g., “era5_sfc”).variables(List[str]): List of variables to extract (e.g., [“t2m”, “d2m”, “tp”]).start_date(str): Start date in ISO format (YYYY-MM-DD or YYYY-MM-DDTHH:MM:SS).end_date(str): End date in ISO format (YYYY-MM-DD or YYYY-MM-DDTHH:MM:SS).geojson(Union[str, Dict]): GeoJSON for spatial filtering. Can be:Python dict with GeoJSON structure
Path to a
.geojsonfile (will be loaded automatically)Region ID string (will fetch shapefile from API automatically)
output_dir(str, optional): Directory to save the NetCDF file. Defaults to{data_dir}/weather/{dataset_name}.
Returns:
xarray.Dataset: The weather data as an xarray Dataset object (if xarray is installed).str: Path to the saved NetCDF file (if xarray is not installed).
Example:
# Download using a region ID (fetches shapefile automatically)
ds = client.download_weather_data(
dataset_name="era5_sfc",
variables=["t2m", "d2m"],
start_date="2024-01-01",
end_date="2024-01-31",
geojson="state_29" # Karnataka region ID
)
# Download using a geojson file path
ds = client.download_weather_data(
dataset_name="era5_sfc",
variables=["tp"],
start_date="2024-06-01",
end_date="2024-06-07",
geojson="path/to/region.geojson"
)
# Download using a geojson dict
bbox_geojson = {
"type": "Feature",
"properties": {"region_id": "custom_bbox"},
"geometry": {
"type": "Polygon",
"coordinates": [[
[70, 10], [80, 10], [80, 20], [70, 20], [70, 10]
]]
}
}
ds = client.download_weather_data(
dataset_name="era5_sfc",
variables=["t2m"],
start_date="2024-01-01",
end_date="2024-01-02",
geojson=bbox_geojson,
output_dir="./my_weather_data"
)
# Work with the xarray Dataset
print(ds)
print(f"Dimensions: {dict(ds.dims)}")
print(f"Variables: {list(ds.data_vars)}")
# Access data
temperature = ds['t2m']
mean_temp = temperature.mean()
print(f"Mean temperature: {mean_temp.values} K")
Output File Naming:
Files are saved with descriptive names in the format:
{dataset_name}_{variable1}_{variable2}_{YYYYMMDD_start}_{YYYYMMDD_end}_{region_id}.nc
For example:
era5_sfc_t2m_d2m_20240101_20240131_state_29.ncera5_sfc_tp_20240601_20240607.nc
API Endpoint: POST /api/v1/weather/datasets/{dataset_name}/download
Requirements:
Weather data access requires DOWNLOAD permission for the
WEATHER_DATA_APIresource type.Contact your DataIO administrator to grant weather data access permissions.
Error Handling¶
The DataIO API client raises standard Python exceptions:
ValueError: For invalid parameters or missing datarequests.HTTPError: For HTTP-related errors (authentication, not found, etc.)requests.ConnectionError: For network connectivity issues
Example:
try:
datasets = client.list_datasets()
except requests.HTTPError as e:
if e.response.status_code == 401:
print("Authentication failed - check your API key")
elif e.response.status_code == 403:
print("Access forbidden - insufficient permissions")
else:
print(f"HTTP error: {e}")
except ValueError as e:
print(f"Invalid parameter: {e}")
except Exception as e:
print(f"Unexpected error: {e}")
Environment Variables¶
The client uses these environment variables:
DATAIO_API_BASE_URL: Base URL for the DataIO APIDATAIO_API_KEY: API key for authenticationDATAIO_DATA_DIR: Directory to download the data to.
Set these in a .env file:
DATAIO_API_BASE_URL=https://dataio.artpark.ai/api/v1
DATAIO_API_KEY=your_api_key_here
DATAIO_DATA_DIR=data