Skip to main content

Why S3

  • Keep ownership of the data inside your AWS account.
  • Apply lifecycle, encryption, or replication policies that match your governance model.
  • Land both full-history archives and incremental refreshes in the same layout you use internally.

Delivery Model

ItemDetails
FormatParquet (default) or CSV; schema aligned with /BlockDb.Postgres.Tables.Public.
Partitioningdataset_id=/0101/chain_id=1/date=2024-01-01/part-*.parquet.
AccessBlockDB assumes an IAM role (sts:AssumeRole) or uses access keys scoped to a folder prefix.
IntegrityEach object has an accompanying .manifest.json listing row counts and SHA-256 hashes.

Setup Steps

  1. Create (or reuse) an S3 bucket in the region closest to your workloads; enable versioning if you want change history.
  2. Provision an IAM role with s3:PutObject, s3:AbortMultipartUpload, and s3:ListBucket on the target prefix. Share the ARN with BlockDB.
  3. Send dataset IDs, chain coverage, and initial backfill ranges to [email protected].
  4. Optionally enable cross-region replication or Glacier policies—BlockDB’s writes respect your bucket policy.

Consuming the Data

  • Use AWS Glue or dbt to catalog the partitions and hydrate warehouse tables created from the BlockDB SQL scripts.
  • Track ingestion by reading the manifest files and comparing against _tracing_id ranges.
  • If you need mutable tables or streaming CDC, pair S3 archives with the Real Time Channels strategy.
Coordinate bucket naming and encryption (SSE-KMS, SSE-S3, or client-side) before the first drop. KMS keys must grant BlockDB’s IAM role encrypt/decrypt permissions.