Skip to main content

Channel Summary

Managed bucket slices extend archive drops by appending fresh partitions every 5-60 minutes. BlockDB owns the orchestration, writes directly into your bucket, and attaches manifests so you can reuse existing ETL jobs.

How It Works

  1. Provide a destination bucket + prefix (S3, Azure Blob, or GCS) and grant write permissions.
  2. BlockDB appends new partitions per dataset and chain, tagged with ingestion timestamps.
  3. Every batch includes a manifest.jsonl file containing _tracing_id ranges, row counts, and sequence numbers.
  4. Duplicate protection: when BlockDB retries a batch the same seq is reused so loaders can dedupe safely.

Partition Layout

dataset_id=0101/chain_id=1/date=2025-01-15/hour=13/
  seq=000001.parquet
  seq=000001.manifest.json

Consumption Pattern

  • Configure Glue, dbt, Data Factory, or Databricks jobs to watch for new seq folders.
  • Upsert into warehouse tables using dataset primary keys + _tracing_id.
  • Persist the latest processed seq to resume after outages.

Monitoring & Verification

  • Compare manifest timestamps with Data Freshness SLAs.
  • Sample rows and run Verification endpoints for periodic audits.
  • Hook bucket notifications (EventBridge, Event Grid, Pub/Sub) into your alerting stack to detect stalled feeds.

Pairing With Other Channels

  • Let bucket slices feed analytics stores while WebSocket streams power alerting.
  • Maintain a rolling buffer (e.g., 7 days) in hot storage and archive older partitions to Glacier/Coldline to manage costs.
The folder layout matches archive deliveries, so no new ingestion code is required—just run the same loaders more frequently.