Skip to main content

Overview

If you consume BlockDB via the Snowflake Share, enable Snowflake streams to capture inserts/updates within minutes. Streams generate change rows you can merge into analytics tables for near-real-time freshness.

Create a Stream

CREATE OR REPLACE STREAM blocks_cdc
  ON TABLE blockdb_prod.public."0101_blocks_v1"
  APPEND_ONLY = FALSE;
  • Use one stream per dataset for clearer orchestration.
  • APPEND_ONLY=FALSE ensures updates appear with metadata$action.

Processing Loop

  1. Schedule a Snowflake task (every 5-15 minutes) to read from each stream.
  2. MERGE into your curated tables keyed on _tracing_id or primary columns.
  3. Track offsets so you can re-run a window if the task fails.

Task Template

CREATE OR REPLACE TASK ingest_blocks_cdc
  WAREHOUSE = ingest_wh
  SCHEDULE = '5 MINUTE'
AS
MERGE INTO analytics.blocks AS tgt
USING (
  SELECT *, metadata$action, metadata$isupdate
  FROM blockdb_prod.public.blocks_cdc
) AS src
ON tgt._tracing_id = src._tracing_id
WHEN MATCHED THEN UPDATE SET *
WHEN NOT MATCHED THEN INSERT *;

Operational Tips

  • Lag monitoring: Compare the latest _updated_at from source tables vs. your analytics schema.
  • Schema changes: Streams break on column changes—subscribe to Schema Governance and recreate streams after migrations.
  • Fallback: If a stream falls behind, reload from the share tables (or archives) then re-enable the task.
  • Cost: Keep ingest warehouses small; CDC batches are usually tiny.
Streams live entirely in your Snowflake account—BlockDB doesn’t need additional infrastructure once the share is active.