Infrastructure
All infrastructure is managed with Terraform and lives in the infra/ directory. The platform runs entirely on Google Cloud Platform.
GCP Resources
Terraform provisions the following resources in the american-beauty-institute GCP project (us-east1 region):
| Resource | Name | Purpose |
|---|---|---|
| Compute Engine VM | dagster-daemon |
Runs the Dagster daemon and code server (e2-micro, Debian 12) |
| GCS Bucket | american-beauty-institute-raw |
Raw data landing zone (versioned) |
| GCS Bucket | american-beauty-institute-tf-state |
Terraform remote state (versioned) |
| Service Account | dagster-daemon |
Identity for the Dagster VM |
| Firewall Rule | allow-iap-ssh |
Allows SSH through Identity-Aware Proxy |
The VM has no external IP. All SSH access goes through IAP, which restricts inbound traffic to Google's IAP range (35.235.240.0/20) on port 22.
Service Account & IAM
The dagster-daemon service account has five IAM bindings:
| Role | Scope | Purpose |
|---|---|---|
roles/bigquery.dataEditor |
Project | Read and write BigQuery tables |
roles/bigquery.jobUser |
Project | Run BigQuery jobs |
roles/storage.objectAdmin |
Raw bucket | Read, write, and delete objects in the raw data bucket |
roles/secretmanager.secretAccessor |
Project | Fetch API credentials from Secret Manager at startup |
roles/iap.tunnelResourceAccessor |
Project | Allow SSH tunneling through IAP |
Terraform State
State is stored remotely in GCS:
backend "gcs" {
bucket = "american-beauty-institute-tf-state"
prefix = "terraform/state"
}
The state bucket has versioning enabled, so previous state versions can be recovered if needed.
Terraform Layout
infra/
├── main.tf # All resource definitions
├── variables.tf # Input variables
├── outputs.tf # Output values
├── versions.tf # Provider and backend configuration
└── files/
├── startup.sh # VM first-boot script
├── dagster.yaml # Dagster daemon configuration
├── dagster.service
├── dagster-code.service
├── dagster-healthcheck.service
├── dagster-healthcheck.timer
└── dagster-healthcheck.sh
Variables
| Variable | Default | Description |
|---|---|---|
project |
american-beauty-institute |
GCP project ID |
region |
us-east1 |
GCP region |
zone |
us-east1-b |
GCP zone for the VM |
credentials_file |
(required) | Path to a GCP service account JSON key |
dagster_repo_url |
https://github.com/americanbeautyinstitute/data-platform.git |
Repo cloned onto the VM |
dagster_branch |
main |
Branch checked out on the VM |
Applying Changes
Requires Terraform >= 1.7 and the Google provider ~> 5.0.
cd infra
terraform init
terraform plan -var="credentials_file=path/to/key.json"
terraform apply -var="credentials_file=path/to/key.json"
VM First Boot
When the VM starts for the first time, the startup script (infra/files/startup.sh) runs the following steps:
- Installs system dependencies (
git,curl). - Creates a
dagstersystem user. - Installs
uvas thedagsteruser. - Clones the repo and installs Python dependencies with
uv sync --no-dev. - Sets up
DAGSTER_HOMEat/var/dagster/homeand copiesdagster.yaml. - Pulls secrets from GCP Secret Manager and writes them to
/home/dagster/data-platform/.env. - Copies systemd unit files and enables the Dagster services and health check timer.
After the startup script completes, the service account JSON key must be copied to the VM manually:
gcloud compute scp key.json dagster-daemon:/etc/gcp/service-account.json \
--zone=us-east1-b --tunnel-through-iap --project=american-beauty-institute
Then set ownership on the VM:
sudo chown dagster:dagster /etc/gcp/service-account.json
Outputs
After terraform apply, the following outputs are available:
| Output | Description |
|---|---|
dagster_vm_name |
Name of the Dagster VM |
dagster_vm_zone |
Zone of the Dagster VM |
raw_bucket_name |
Name of the raw data GCS bucket |
tf_state_bucket_name |
Name of the Terraform state bucket |
dagster_service_account_email |
Email of the Dagster service account |