Introduction
For performance and cost benchmarks of running GROMACS on SaladCloud, please refer to this blog post. You can also check the GitHub repository for the benchmarking Dockerfile and detailed test methodology. Molecular dynamics simulations with GROMACS can run from several hours to multiple days, depending on factors such as CPU/GPU performance, system size (number of atoms or molecules), simulation length (number of steps), and the level of physical detail modeled. Large-scale simulations often produce substantial output—ranging up to tens of gigabytes—including trajectories, energy profiles, and other results. SaladCloud operates on a distributed network of interruptible nodes, meaning any node running your tasks may shut down unexpectedly and all runtime data are removed (data persistence must be managed via external cloud storage). Despite this, most Salad nodes remain stable for over 10 hours at a time. To run GROMACS effectively on SaladCloud, it’s recommended to divide large simulation tasks into manageable chunks—such as 30-minute runs—and execute them sequentially. This chunked approach is natively supported by GROMACS and can be implemented with just a few lines of code. With this adapted workflow, each chunk generates its own small output files along with the updated checkpoint, all of which can be uploaded to cloud storage immediately. When resuming after node reallocation, only the input file and the checkpoint file need to be downloaded from the cloud—eliminating the need to upload or download large files at any point. Salad nodes are globally distributed, leading to variations in network latency, geographic distance, and throughput to specific cloud storage endpoints. Many nodes also have asymmetric bandwidth, with upload speeds typically lower than download speeds. Nevertheless, more than 90% of nodes can upload over 10 GB of data per hour—more than sufficient for GROMACS workloads. For detailed performance metrics, refer to the cloud storage benchmarks on SaladCloud.Single-Replica Container Group vs. Job Queue System
As a starting point, we recommend creating a dedicated Single-Replica Container Group (SRCG) for each simulation task on SaladCloud. This setup is easy to launch and works well for larger simulations—spanning tens of hours or multiple days. Task-specific configurations, such as the number of steps, chunking intervals, and input/output paths in cloud storage—can be passed to the instance via environment variables. While tasks are briefly paused during node reallocation after interruptions, our testing across a large number of samples shows that total downtime remains under 4% of the overall runtime. On the other hand, if you’re running a large number of simulation tasks—a job queue becomes essential to ensure efficiency and scalability. Systems like GCP Pub/Sub, AWS SQS, Salad Kelpie, or custom solutions using Redis or Kafka can be used to distribute jobs (task-specific configurations) across a pool of Salad nodes. If a node fails during job execution, the job queue ensures the job is retried immediately on another available node. You can further implement autoscaling by monitoring the number of available jobs in the queue and dynamically adjusting the number of Salad nodes. This approach ensures that your target number of tasks is completed within a defined timeframe, while also allowing cost control during periods of lower demand. This guide focuses on the SRCG-based approach. A separate guide will cover job queue integration.Chunking Large Simulations
To demonstrate how to split a GROMACS simulation into chunks, consider the following example.j1.tpr
file. It uses a
single MPI rank and 8 OpenMP threads, with j1
as the default filename prefix for all output files. All major
components of the simulation—non-bonded interactions, PME, bonded interactions, and integration—are offloaded to the
GPU. CPU threads are pinned to specific cores to ensure consistent performance.
Let’s take a closer look at the input, checkpoint and output files generated by the simulation.

- .tpr – The portable binary run input file. It contains everything required to run the simulation: topology, parameters, coordinates, velocities, and simulation settings.
- .cpt – The checkpoint file, updated at configurable intervals (every 15 minutes by default) during the simulation. It allows the simulation to resume from the last saved state after an interruption. While its contents change, the file size remains constant for a given input. A _prev.cpt file may be created to store the previous checkpoint when a new checkpoint is written.
- .edr, .log, .trr, .xtc – These output files are continuously and rapidly updated during the simulation, with
new data appended at regular step intervals. Over long runs, they can grow significantly in size—possibly reaching
tens of gigabytes, especially the .trr file.
- .edr stores energy terms such as temperature, pressure, and kinetic/potential energy.
- .log records detailed information about the run, including settings, performance metrics, and diagnostic messages.
- .trr is a full-precision trajectory file containing coordinates, velocities, and forces at each saved step.
- .xtc is a compressed trajectory file that stores only atomic coordinates; it may be generated in some simulations to save storage space.
- .gro – A coordinate file generated upon simulation completion. It represents the final state of the system and can serve as input for a continuation or new simulation.

Running GROMACS on SaladCloud
Code Optimizations
To run GROMACS efficiently on SaladCloud, several additional code optimizations are recommended:- Use chunked and parallel data transfers to maximize throughput when uploading or downloading large files between Salad nodes and cloud storage.
- Introduce a dedicated uploader thread with task queue to offload upload operations from the main thread, ensuring that simulation execution remains unblocked.
- Add error handling, monitoring, and logging to improve reliability, enhance visibility, and simplify troubleshooting.

Dockerfile Configuration
The provided Dockerfile creates a containerized environment by using the miniconda official image, then installing essential utilities (VS Code Server CLI), GROMACS 2024.5 and required dependencies. It copies the required Python code into the image and sets the default command.Environment Variables
Ensure all required environment variables are set before running a simulation. These variables will be passed to the SRCG at creation time. For easy configuration and reuse, you can define them in a .env file located in the project directory..env
Local Run
If you have access to a local GPU environment, you can perform a test of the image before running it on SaladCloud. Usedocker compose
to start the container defined in
docker-compose.yaml. The command
automatically loads environment variables from the .env file in the same directory.
Deployment on SaladCloud
Run salad_quotas.py to view detailed information about your current quotas and available GPU types, and refer to salad_deploy.py for a deployment example of an SRCG with the SaladCloud Python SDK to run a simulation task. The salad_monitor.py script monitors the simulation progress by checking the files uploaded to cloud storage. It can also reset the test environment by clearing the cloud folder while preserving the .tpr input file. To debug and perform manual tasks, follow this guide to connect to the SRCG using VS Code.Test Results and Benchmarks
Single-Day Simulation Results
The input model for the test is a huge virus protein with 1,066,628 atoms. The test run 10,000,000 steps with a chunking interval of 30 minutes. Here is the summary of the test run results on SaladCloud:- 2 Interruptions: With a 30 minutes chunking interval, the interruptions may result in 30 minutes of lost compute time (estimated as N x Chunking Interval / 2). This lost time is not included in the efficient runtime, and can be reduced by using smaller chunking intervals.
- 3 Node allocation/reallocations. This includes image pulling and environment setup, which are not charged by SaladCloud.
Machine ID | GPU | Start | End | Duration | Chunks | Steps | Steps/Second |
---|---|---|---|---|---|---|---|
c599288a-9974-3858-b1eb-1e8b2e8aa9ab | NVIDIA GeForce RTX 4060 Ti | 2025-06-18 10:45:09 | 2025-06-19 00:11:28 | 13:26:19 | 27 | 3,717,000 | 76.83 steps/sec |
e138daef-c41c-e050-ba2d-889b7b9a08fc | NVIDIA GeForce RTX 4090 | 2025-06-19 00:46:07 | 2025-06-19 01:16:05 | 0:29:58 | 1 | 385,400 | 214.35 steps/sec |
e3b930f7-637d-6e57-a67b-3bb9563a2796 | NVIDIA GeForce RTX 4090 | 2025-06-19 01:49:11 | 2025-06-19 10:49:55 | 9:00:44 | 19 | 5,897,600 | 181.78 steps/sec |
Multi-Day Simulation Results
The test uses the same input model as described above, but runs for 30,000,000 steps, with the simulation chunked into 10-minute intervals. Here is the summary of the test run results on SaladCloud:- 1 Interruption: With a 10 minutes chunking interval, the interruption may result in 5 minutes of lost compute time (estimated as N x Chunking Interval / 2). This lost time is not included in the efficient runtime, and can be reduced by using smaller chunking intervals.
- 2 Node allocation/reallocations. This includes image pulling and environment setup, which are not charged by SaladCloud.
Machine ID | GPU | Start | End | Duration | Chunks | Steps | Steps/Second |
---|---|---|---|---|---|---|---|
602e45b0-1fd8-4352-bddc-f96270e1f048 | NVIDIA GeForce RTX 4060 Ti | 2025-06-19 16:25:00 | 2025-06-23 12:39:15 | 3 days, 20:14:15 | 544 | 22,254,600 | 67.02 steps/sec |
3d18a605-08d6-1958-8795-e500065b2377 | NVIDIA GeForce RTX 4070 Ti SUPER | 2025-06-23 13:02:22 | 2025-06-24 04:36:11 | 15:33:49 | 94 | 7,745,400 | 138.24 steps/sec |