Overview
Azure Blob Storage mirrors the S3 experience but keeps data inside your Azure subscription. BlockDB writes archives into an ADLS Gen2 container using a dedicated service principal so you can pipe the files into Synapse, Fabric, or Databricks.Delivery Specs
- Containers: One per environment (e.g.,
blockdb-prod-archives). BlockDB targets a specific path such as/datasets/0101_blocks_v1/. - Authentication: OAuth client credentials (service principal) with
Storage Blob Data Contributoron the container. - Formats: Parquet (recommended) or CSV; naming matches
dataset_id=0101/date=2024-01-01/part-*.parquet. - Metadata: Manifests stored alongside the data with row counts, checksums, and
_tracing_idranges.
Provisioning Checklist
- Create the storage account (enable hierarchical namespace for ADLS Gen2).
- Register a service principal and grant it access to the container path.
- Share the tenant ID, client ID, and secret plus the target container URL with [email protected].
- Specify datasets, chains, and start/end timestamps; BlockDB schedules the first export drop.
Integrating Downstream
- Use Synapse Pipelines or Azure Data Factory to copy the data into dedicated SQL pools.
- Mount the container to Databricks and hydrate tables defined in
/BlockDb.Postgres.Tables.Public. - Monitor ingestion by reading the manifest blobs and reconciling counts with your warehouse.
If you enforce customer-managed keys, grant the service principal
get permissions on the Key Vault secret so BlockDB can encrypt uploads.