Infrastructure

All infrastructure is managed with Terraform and lives in the infra/ directory. The platform runs entirely on Google Cloud Platform.

GCP Resources

Terraform provisions the following resources in the american-beauty-institute GCP project (us-east1 region):

Resource	Name	Purpose
Compute Engine VM	`dagster-daemon`	Runs the Dagster daemon and code server (`e2-micro`, Debian 12)
GCS Bucket	`american-beauty-institute-raw`	Raw data landing zone (versioned)
GCS Bucket	`american-beauty-institute-tf-state`	Terraform remote state (versioned)
Service Account	`dagster-daemon`	Identity for the Dagster VM
Firewall Rule	`allow-iap-ssh`	Allows SSH through Identity-Aware Proxy

The VM has no external IP. All SSH access goes through IAP, which restricts inbound traffic to Google's IAP range (35.235.240.0/20) on port 22.

Service Account & IAM

The dagster-daemon service account has five IAM bindings:

Role	Scope	Purpose
`roles/bigquery.dataEditor`	Project	Read and write BigQuery tables
`roles/bigquery.jobUser`	Project	Run BigQuery jobs
`roles/storage.objectAdmin`	Raw bucket	Read, write, and delete objects in the raw data bucket
`roles/secretmanager.secretAccessor`	Project	Fetch API credentials from Secret Manager at startup
`roles/iap.tunnelResourceAccessor`	Project	Allow SSH tunneling through IAP

Terraform State

State is stored remotely in GCS:

backend "gcs" {
  bucket = "american-beauty-institute-tf-state"
  prefix = "terraform/state"
}

The state bucket has versioning enabled, so previous state versions can be recovered if needed.

Terraform Layout

infra/
├── main.tf           # All resource definitions
├── variables.tf      # Input variables
├── outputs.tf        # Output values
├── versions.tf       # Provider and backend configuration
└── files/
    ├── startup.sh    # VM first-boot script
    ├── dagster.yaml  # Dagster daemon configuration
    ├── dagster.service
    ├── dagster-code.service
    ├── dagster-healthcheck.service
    ├── dagster-healthcheck.timer
    └── dagster-healthcheck.sh

Variables

Variable	Default	Description
`project`	`american-beauty-institute`	GCP project ID
`region`	`us-east1`	GCP region
`zone`	`us-east1-b`	GCP zone for the VM
`credentials_file`	(required)	Path to a GCP service account JSON key
`dagster_repo_url`	`https://github.com/americanbeautyinstitute/data-platform.git`	Repo cloned onto the VM
`dagster_branch`	`main`	Branch checked out on the VM

Applying Changes

Requires Terraform >= 1.7 and the Google provider ~> 5.0.

cd infra
terraform init
terraform plan -var="credentials_file=path/to/key.json"
terraform apply -var="credentials_file=path/to/key.json"

VM First Boot

When the VM starts for the first time, the startup script (infra/files/startup.sh) runs the following steps:

Installs system dependencies (git, curl).
Creates a dagster system user.
Installs uv as the dagster user.
Clones the repo and installs Python dependencies with uv sync --no-dev.
Sets up DAGSTER_HOME at /var/dagster/home and copies dagster.yaml.
Pulls secrets from GCP Secret Manager and writes them to /home/dagster/data-platform/.env.
Copies systemd unit files and enables the Dagster services and health check timer.

After the startup script completes, the service account JSON key must be copied to the VM manually:

gcloud compute scp key.json dagster-daemon:/etc/gcp/service-account.json \
  --zone=us-east1-b --tunnel-through-iap --project=american-beauty-institute

Then set ownership on the VM:

sudo chown dagster:dagster /etc/gcp/service-account.json

Outputs

After terraform apply, the following outputs are available:

Output	Description
`dagster_vm_name`	Name of the Dagster VM
`dagster_vm_zone`	Zone of the Dagster VM
`raw_bucket_name`	Name of the raw data GCS bucket
`tf_state_bucket_name`	Name of the Terraform state bucket
`dagster_service_account_email`	Email of the Dagster service account