Loader API Reference

The load layer has two modules — load/gcs and load/bigquery — and a shared config module at load/config.py. Each asset calls both loaders in sequence: GCS first, then BigQuery.

GCS Loader

`load.gcs.load.load(table, config, client) → str`

Serializes a PyArrow table to Parquet (Snappy compression), uploads it to GCS, and returns the GCS URI.

Parameter	Type	Description
`table`	`pa.Table`	PyArrow table to upload
`config`	`GCSConfig`	Bucket, source name, partition date, and run ID
`client`	`storage.Client`	Authenticated GCS client (from Dagster `GCSResource`)
Returns	`str`	GCS URI (`gs://bucket/path`)

Blob Path Convention

Files are stored at a deterministic path built by load.gcs.partition.build_gcs_blob_path():

{source}/date={YYYY-MM-DD}/{source}-{run_id}.parquet

The run_id (a UUID) makes each upload unique. The date= prefix enables partition discovery by downstream tools.

BigQuery Loader

`load.bigquery.load.load(gcs_uri, config, client) → int`

Loads a GCS Parquet file into a BigQuery date-partitioned table and returns the row count.

Parameter	Type	Description
`gcs_uri`	`str`	GCS URI of the Parquet file
`config`	`BigQueryConfig`	Project, dataset, table, partition date, and optional clustering fields
`client`	`bigquery.Client`	Authenticated BigQuery client (from Dagster `BigQueryResource`)
Returns	`int`	Number of rows loaded

Each load targets a single partition using BigQuery's partition decorator (project.dataset.table$YYYYMMDD). The write disposition is WRITE_TRUNCATE, so rerunning a partition replaces it without duplicating data. Other partitions remain untouched.

Configuration Models

Both configs are frozen Pydantic models defined in load/config.py.

`GCSConfig`

Field	Type	Default	Description
`bucket`	`str`	—	GCS bucket name
`source`	`str`	—	Source identifier (e.g., `stripe_charges`)
`partition_date`	`date`	—	Date for the partition path
`run_id`	`str`	—	UUID identifying this run

`BigQueryConfig`

Field	Type	Default	Description
`project`	`str`	—	GCP project ID
`dataset`	`str`	—	BigQuery dataset (e.g., `raw`)
`table`	`str`	—	BigQuery table name
`partition_date`	`date`	—	Target partition date
`partition_field`	`str`	`"date"`	Column used for time partitioning
`cluster_fields`	`list[str]`	`[]`	Optional clustering columns

Usage in Assets

A typical ingestion asset calls both loaders in sequence:

# 1. Extract
table = stripe_extract.extract(client, date_str, date_str)

# 2. Load to GCS
gcs_uri = gcs_load.load(table, gcs_config, gcs.get_client())

# 3. Load to BigQuery from GCS
rows_loaded = bq_load.load(gcs_uri, bq_config, bigquery.get_client())

GCS acts as the durable landing zone. BigQuery loads from GCS rather than directly from memory, so the raw Parquet file persists even if the BigQuery load fails.