Tutorial · 9 min read

Containers on Regulated HPC Clusters

Containers are the fastest way to share environments across labs, but compliance officers still expect airtight controls. Follow this workflow to ship GPU-ready images that pass security reviews on ORNL, Argonne, and pharma clusters.

Toolchain snapshot

Build + sign workflow

apptainer build --fakeroot climate.sif climate.def
cosign sign --key k8s://hcp/signing climate.sif
cosign verify --key hcp/signing.pub climate.sif

Keep definition files in git for diffable review and store signing keys in an HSM-backed vault.

Granting GPU + network access

apptainer run --nv \
  --bind /lus:/lus,/scratch:/scratch \
  climate.sif python run.py

Use --nv or --rocm runtime flags, bind only required directories, and rely on Slurm job_container or burst-buffer stages for inputs/outputs.

Policy enforcement

  1. Maintain allow-list registries/digests via /etc/containers/policy.json.
  2. Scan images with Trivy/Grype before promotion.
  3. Monitor runtime with Falco/eBPF for unexpected syscalls.

Troubleshooting

Compliance talking points

References