Skip to main content

Why Delta Share Streaming

  • Zero-copy access to BlockDB tables with Change Data Feed (CDF) enabled.
  • Works with Databricks SQL, Spark Structured Streaming, Unity Catalog, or any Delta Sharing-compatible client.
  • Lets you blend streaming ingestion with archive backfills using the same schema.

Prerequisites

  1. Accept the archive invitation described in Databricks Delta Share.
  2. Create a catalog/database in Unity Catalog that references the share.
  3. Grant service principals or clusters SELECT on the shared tables.

Reading Change Data

SELECT *
FROM table_changes(blockdb_archive."0101_blocks_v1", 'latest');
  • Provide a version or timestamp to replay from any point.
  • Returns inserted/updated rows plus _commit_version and _commit_timestamp.

Streaming Example (PySpark)

df = (spark.readStream
        .format("deltaSharing")
        .option("shareCredentialsFile", "/dbfs/FileStore/blockdb/share.json")
        .option("readChangeFeed", "true")
        .table("blockdb_archive.0101_blocks_v1"))

(df.writeStream
    .format("delta")
    .option("checkpointLocation", "dbfs:/checkpoints/blockdb/blocks")
    .table("analytics.blocks_current"))

Operational Notes

  • Idempotency: Merge by _tracing_id and dataset keys to avoid duplicates when restarting streams.
  • Latency: Expect a few minutes of lag between BlockDB commit and CDF availability—monitor _commit_timestamp.
  • Schema evolution: Follow Schema Governance; refresh downstream schemas when new columns appear.
  • Security: Delta Share is read-only; persist curated gold tables inside your workspace for custom permissions.
Combine CDF ingestion with WebSocket alerts: WebSocket events trigger fast reactions, while Delta Share ensures eventual consistency in the lakehouse.