Why S3
- Keep ownership of the data inside your AWS account.
- Apply lifecycle, encryption, or replication policies that match your governance model.
- Land both full-history archives and incremental refreshes in the same layout you use internally.
Delivery Model
| Item | Details |
|---|---|
| Format | Parquet (default) or CSV; schema aligned with /BlockDb.Postgres.Tables.Public. |
| Partitioning | dataset_id=/0101/chain_id=1/date=2024-01-01/part-*.parquet. |
| Access | BlockDB assumes an IAM role (sts:AssumeRole) or uses access keys scoped to a folder prefix. |
| Integrity | Each object has an accompanying .manifest.json listing row counts and SHA-256 hashes. |
Setup Steps
- Create (or reuse) an S3 bucket in the region closest to your workloads; enable versioning if you want change history.
- Provision an IAM role with
s3:PutObject,s3:AbortMultipartUpload, ands3:ListBucketon the target prefix. Share the ARN with BlockDB. - Send dataset IDs, chain coverage, and initial backfill ranges to [email protected].
- Optionally enable cross-region replication or Glacier policies—BlockDB’s writes respect your bucket policy.
Consuming the Data
- Use AWS Glue or dbt to catalog the partitions and hydrate warehouse tables created from the BlockDB SQL scripts.
- Track ingestion by reading the manifest files and comparing against
_tracing_idranges. - If you need mutable tables or streaming CDC, pair S3 archives with the Real Time Channels strategy.
Coordinate bucket naming and encryption (SSE-KMS, SSE-S3, or client-side) before the first drop. KMS keys must grant BlockDB’s IAM role encrypt/decrypt permissions.