Skip to main content

Overview

Azure Blob Storage mirrors the S3 experience but keeps data inside your Azure subscription. BlockDB writes archives into an ADLS Gen2 container using a dedicated service principal so you can pipe the files into Synapse, Fabric, or Databricks.

Delivery Specs

  • Containers: One per environment (e.g., blockdb-prod-archives). BlockDB targets a specific path such as /datasets/0101_blocks_v1/.
  • Authentication: OAuth client credentials (service principal) with Storage Blob Data Contributor on the container.
  • Formats: Parquet (recommended) or CSV; naming matches dataset_id=0101/date=2024-01-01/part-*.parquet.
  • Metadata: Manifests stored alongside the data with row counts, checksums, and _tracing_id ranges.

Provisioning Checklist

  1. Create the storage account (enable hierarchical namespace for ADLS Gen2).
  2. Register a service principal and grant it access to the container path.
  3. Share the tenant ID, client ID, and secret plus the target container URL with [email protected].
  4. Specify datasets, chains, and start/end timestamps; BlockDB schedules the first export drop.

Integrating Downstream

  • Use Synapse Pipelines or Azure Data Factory to copy the data into dedicated SQL pools.
  • Mount the container to Databricks and hydrate tables defined in /BlockDb.Postgres.Tables.Public.
  • Monitor ingestion by reading the manifest blobs and reconciling counts with your warehouse.
If you enforce customer-managed keys, grant the service principal get permissions on the Key Vault secret so BlockDB can encrypt uploads.