Tutorials
We prioritize quality over volume. Expect living documents, diagrams, and copy/paste-ready code blocks rather than auto-generated man pages. Each tutorial is tested against current toolchains (Slurm 24.05, Apptainer 1.3, Open MPI 5.x) and the security controls Fortune 500 enterprises demand.
-
Designing Resilient Slurm Jobs
Blueprints for multi-step HPC workflows that survive busy queues, preemption, and node hiccups.
- Intent-first job headers
- Heterogeneous allocations
- Observability hooks
-
MPI Launch Patterns That Scale
Modern MPI playbook covering multi-NIC fabrics, collectives offload, and debugging across thousands of ranks.
- UCX/libfabric tuning
- Flux/Balsam orchestration
- Profiling workflow
-
Containers on Regulated HPC Clusters
Run Apptainer, OCI, and NVIDIA NGC images with zero-trust controls, GPU access, and reproducible pipelines.
- Build & signing workflow
- Policy enforcement
- Troubleshooting checklist