Why Delta Share Streaming
- Zero-copy access to BlockDB tables with Change Data Feed (CDF) enabled.
- Works with Databricks SQL, Spark Structured Streaming, Unity Catalog, or any Delta Sharing-compatible client.
- Lets you blend streaming ingestion with archive backfills using the same schema.
Prerequisites
- Accept the archive invitation described in Databricks Delta Share.
- Create a catalog/database in Unity Catalog that references the share.
- Grant service principals or clusters
SELECTon the shared tables.
Reading Change Data
- Provide a version or timestamp to replay from any point.
- Returns inserted/updated rows plus
_commit_versionand_commit_timestamp.
Streaming Example (PySpark)
Operational Notes
- Idempotency: Merge by
_tracing_idand dataset keys to avoid duplicates when restarting streams. - Latency: Expect a few minutes of lag between BlockDB commit and CDF availability—monitor
_commit_timestamp. - Schema evolution: Follow Schema Governance; refresh downstream schemas when new columns appear.
- Security: Delta Share is read-only; persist curated gold tables inside your workspace for custom permissions.
Combine CDF ingestion with WebSocket alerts: WebSocket events trigger fast reactions, while Delta Share ensures eventual consistency in the lakehouse.