Overview
Migrating from Vast.ai to SaladCloud is a straightforward process that preserves your existing development workflow while automating the operational complexity you’re used to managing manually. If you’re currently SSH-ing into instances, installing dependencies, and running scripts on Vast.ai, you’ll find that SaladCloud maintains the same familiar patterns—your Python code, ML frameworks, and data processing logic remain identical. What Stays Exactly the Same:- Your application code and algorithms work unchanged
- Same Python libraries, PyTorch/TensorFlow frameworks, and CUDA operations
- Identical API patterns, data processing workflows, and model training approaches
- Same debugging mindset—just with better tools than SSH
- Your development environment and local testing process
💡 New to containerization? Check out our comprehensive getting started guide for a step-by-step introduction to deploying on SaladCloud, or explore our architectural overview to understand how SaladCloud’s distributed GPU network works.Think of containerization as creating a “recipe” for the manual setup you already do on Vast.ai—instead of SSH-ing in and running
pip install
commands each time, you write those same commands once in a Dockerfile, Docker builds an
immutable image with everything pre-installed, and that same image runs consistently across all instances. Most
developers find this eliminates their biggest frustrations with manual instance management while keeping everything else
familiar.
Why Containerization Has Become the Industry Standard
Containerization has emerged as the de facto deployment standard across the technology industry for compelling reasons that directly benefit developers and organizations. Containers provide consistency across environments by packaging applications with all their dependencies, eliminating the “it works on my machine” problem that has plagued software deployment for decades. This consistency extends from development laptops to production clusters, ensuring predictable behavior regardless of the underlying infrastructure. The portability offered by containers is transformative—applications become truly platform-agnostic, running identically on any system that supports container runtimes. This portability reduces vendor lock-in and enables organizations to migrate workloads between cloud providers, on-premises infrastructure, or hybrid environments without code changes. Additionally, containers enable efficient resource utilization through consistent packaging and deployment, while SaladCloud’s architecture ensures each container gets dedicated access to full GPU resources on individual nodes. Perhaps most importantly, containers have revolutionized deployment velocity and reliability. Teams can package, test, and deploy applications in minutes rather than hours, while container orchestration platforms provide automatic scaling, health monitoring, and self-healing capabilities. This operational efficiency has made containerization essential for modern DevOps practices and continuous delivery pipelines. On SaladCloud, each container runs on a dedicated GPU node, ensuring your application has exclusive access to the full GPU resources without sharing with other workloads. This dedicated approach maximizes performance while maintaining the portability and consistency benefits of containerization.Key Platform Differences
Vast.ai Architecture
- Individual GPU instances with SSH access
- VM-like environment with direct file system access
- Instance-based pricing and management
SaladCloud Architecture
- Containerized applications with automatic orchestration
- Distributed network with built-in redundancy
- Container-based deployment with health monitoring
Migration Requirements
What You’re Already Doing (Made Easier)
The “requirements” below are actually improvements to processes you’re already handling manually on Vast.ai. Rather than learning entirely new concepts, you’re automating existing workflows with better consistency and reliability.-
Containerization replaces manual dependency installation on each instance. Instead of SSH-ing in and running the
same
pip install
commands repeatedly, you write them once in a Dockerfile, Docker builds the dependencies into an immutable image, and that image runs consistently across all instances. - Storage Strategy shifts from local file management to cloud-based storage patterns. While cloud APIs provide more reliable data persistence than manually copying files between instances, this transition requires rethinking data workflows. You’ll need to consider upload/download costs, latency impacts, and potential network reliability issues that weren’t factors with local storage on Vast.ai.
- Network Architecture replaces managing multiple ports and SSH tunneling. You get a single port with automatic load balancing instead of manually configuring port forwarding and access rules.
- AMD64 Architecture is what you’re already using on most Vast.ai instances, so this requires no change to your existing applications.
Technical Implementation (Familiar Concepts)
These constraints map directly to what you’re already working with, just more consistently managed:- Container Images: 35GB limit (larger than most Vast.ai instance setups)
- Storage: Cloud-based (eliminates instance storage limitations) - see our storage integration guide for persistent data patterns
- Networking: IPv6 support (replace
0.0.0.0
with::
in your bind address) - Debugging: Web terminal and portal logs (more convenient than SSH key management)
📘 Container Registry Options: SaladCloud supports all major container registries. See our guides for Docker Hub, AWS ECR, Azure ACR, and Google Artifact Registry.
Migration Process
All deployment and management tasks described in this guide can be accomplished through the intuitive SaladCloud web portal at portal.salad.com or programmatically via our REST API and SDKs (Python and TypeScript). The portal provides a visual interface perfect for getting started and one-off deployments, while the API and SDKs enable automation, CI/CD integration, and infrastructure-as-code workflows. You can seamlessly switch between approaches—deploy through the portal initially, then automate with the API as your needs grow.Phase 1: Assessment and Containerization
Assessment and Planning- Catalog your current Vast.ai workloads
- Identify containerization candidates
- Set up SaladCloud account and API access
- Create Dockerfiles for your applications
- Build and test containers locally
- Push images to container registry
🔧 Containerization Resources: If you’re new to Docker, check out our Docker deployment tutorial for practical examples, or see specifying container commands for advanced startup configuration.
Phase 2: Deployment and Optimization
Initial Deployment- Deploy containers to SaladCloud (via portal or API)
- Configure Container Gateway and health probes
- Set up monitoring and logging
📊 Monitoring & Logging: For production workloads, consider setting up external logging with providers like Axiom (recommended), Datadog, or New Relic for advanced log analysis and retention.Testing and Optimization
- Validate performance and functionality
- Optimize resource allocation (containers have CPU/memory limits, not direct hardware allocation like VMs)
- Complete migration of remaining workloads
- CPU Limits: Your container gets guaranteed access up to the specified vCPU count, but performance characteristics may differ across node types
- Memory Limits: Hard limits enforced by the container runtime - exceeding these will terminate your container
- GPU Access: Each container gets exclusive access to the full GPU on its assigned node
- Storage: Container filesystem is ephemeral - data doesn’t persist between container restarts unless using external storage
Step-by-Step Migration Process
Step 1: Prepare Your SaladCloud Environment
Account Setup- Create account at portal.salad.com
- Set up organization and project
- Add billing information and initial credits
- Generate API key for programmatic access
Step 2: Containerize Your Applications
Understanding Containerization for Vast.ai Users If you’re coming from Vast.ai without container experience, think of containerization as creating a “blueprint” for your application environment. Instead of manually installing dependencies on each GPU instance, you define everything your application needs in a simple text file called a Dockerfile. The key difference is that containers provide environment consistency - when your container image is deployed across multiple SaladCloud nodes, each instance runs in an identical environment that was defined at build time. No More Manual Environment Setup One of the biggest advantages of containerization is that complex environments like PyTorch with CUDA are available as pre-built, officially maintained images. Remember the frustration of manually configuring CUDA drivers, PyTorch versions, and dependency conflicts on Vast.ai? That’s completely eliminated with containers. Instead of spending time on environment setup, you can start with battle-tested base images:- ✅ CUDA drivers pre-installed and configured
- ✅ cuDNN libraries properly linked
- ✅ Framework-specific optimizations
- ✅ Compatible Python environments
- ✅ All dependencies tested together
⚡ GPU Compatibility: SaladCloud guarantees support for CUDA Toolkit 12.0 and later. For the latest RTX 5090/5080 GPUs, see our PyTorch RTX 5090 guide for CUDA 12.8 requirements. Check our high-performance applications guide for GPU optimization tips.Basic Containerization Pattern The Dockerfile below shows how straightforward containerization can be. Notice how it mirrors the same steps you’d typically perform on a Vast.ai instance:
- Dependencies: The
pip install
line works exactly like on Vast.ai - File Structure: Your code organization remains the same
- Startup Command: The CMD line replaces what you’d type in your Vast.ai terminal
- Environment Variables: Still work the same way in containers
- No CUDA Setup: Skip the tedious CUDA/PyTorch installation process entirely
- Consistent Environments: Your exact environment runs identically across all nodes
- Version Control: Pin specific framework versions without compatibility issues
- IPv6 Ready: Use
::
instead of0.0.0.0
for Container Gateway compatibility
🤖 ML-Specific Examples: For machine learning workloads, explore our specialized deployment guides:Storage Integration Example
- Latency Impact: Network calls to cloud storage are slower than local file access. Consider caching frequently accessed data locally during processing.
- Bandwidth Costs: Large model downloads/uploads can be expensive. Evaluate if you need to transfer full datasets or can work with smaller chunks.
- Error Handling: Network operations can fail. Implement retry logic and graceful degradation for storage operations.
- Concurrent Access: Multiple container instances may access the same data. Consider read/write patterns and potential conflicts.
open('/path/to/file.txt', 'r')
becomesload_data_s3('bucket', 'file.txt')
with open('/path/to/file.txt', 'w')
becomesstore_data_s3(data, 'bucket', 'file.txt')
- File paths become bucket keys (still organized hierarchically)
- Your data processing logic is identical
- Error handling patterns remain familiar
- File formats and serialization work exactly as before
- Temporary processing still uses local files when needed
- Reliability: No more lost data when containers restart—your models and results persist
- Scalability: Access the same data from multiple containers simultaneously
- Collaboration: Share datasets and models across your team instantly
- Backup: Built-in redundancy and versioning (with S3) eliminates data loss concerns
- Performance: Often faster than local disk I/O, especially for large files
- Cost Efficiency: Pay only for storage you use, not for reserved disk space
💾 Storage Best Practices: For comprehensive storage strategies, see our Simple Storage Service documentation for file storage patterns, or explore environment variables management for configuration data.
Step 3: Deploy Container Groups
Portal Deployment (Recommended for first deployment)- Navigate to your SaladCloud project
- Click “Create Container Group”
- Configure container settings:
- Image: Your container registry URL
- Replicas: Start with 2-3 for reliability
- Resources: CPU, RAM, and GPU requirements
- Container Gateway: Enable for external access
Step 4: Configure Health Monitoring
Health Probe Implementation- Startup Probe: Configure HTTP probe pointing to
/started
endpoint - Liveness Probe: Configure HTTP probe pointing to
/health
endpoint - Readiness Probe: Configure HTTP probe pointing to
/ready
endpoint
🏥 Health Monitoring Deep Dive: Explore specific probe types:Health Probe Configuration
- Startup probes - verify container initialization
- Readiness probes - control traffic routing
- Liveness probes - detect and restart unhealthy containers
- Health probe in general - practical implementation patterns
Step 5: Set Up Monitoring and Logging
Application Logging📋 Logging Solutions: Choose from multiple external logging providers:
- Axiom (SaladCloud’s preferred provider)
- Datadog for comprehensive monitoring
- Splunk for enterprise environments
- HTTP endpoints for custom solutions
- TCP logging with secure transport
Migration Scenarios
Scenario 1: Simple API Service
Before (Vast.ai): SSH into instance, run Python script After (SaladCloud): Containerized FastAPI with health checks Key changes: Add Dockerfile, configure IPv6, implement health endpointsScenario 2: ML Training Pipeline
Before (Vast.ai): Upload data to instance, run training script After (SaladCloud): Containerized training with S3 data loading Key changes: Implement cloud storage data loading, containerize training codeScenario 3: Multi-Service Application
Before (Vast.ai): Multiple services on different ports After (SaladCloud): Single container with internal routing Key changes: Implement API gateway pattern, consolidate servicesQuick Solutions for Common Challenges
Challenge: File Storage Dependencies
Quick Fix: Use cloud storage APIs for persistent dataChallenge: Multiple Port Applications
Quick Fix: Use path-based routingChallenge: Debugging Without SSH
Quick Fix: Use SaladCloud web terminal and comprehensive logging- Access web terminal through portal for interactive debugging
- Implement detailed logging for troubleshooting
- Use health probes to monitor application state
🛠️ Advanced Debugging: Explore additional troubleshooting resources:
- Troubleshooting guide for common issues
- Performance monitoring for optimization
- Interactive terminal for live debugging
Performance Optimization Tips
Resource Allocation
- Start with 2-3 replicas for reliability
- Monitor resource usage and adjust CPU/memory as needed
- Use appropriate GPU classes for your workload
Network Performance
- Enable Container Gateway for load balancing
- Implement proper health checks for automatic failover
- Use HTTPS for all external communications
🌐 Advanced Networking: For complex networking needs, explore:
- Container Gateway load balancing for traffic distribution
- Tailscale integration for private networks
- Real-time inference patterns for high-throughput applications
Cost Optimization
- Use priority pricing tiers based on availability needs
- Monitor usage through SaladCloud portal
- Scale replicas based on actual demand
💰 Scaling Strategies: Optimize costs and performance with:
- Autoscaling configuration for dynamic replica management
- Job processing patterns for batch workloads
- Long-running task optimization for efficient resource usage
Testing Your Migration
Local Testing
SaladCloud Testing
- Deploy with 1-2 replicas initially
- Test Container Gateway connectivity
- Validate health probes are working
- Monitor logs for any issues
- Scale up once validated
Migration Checklist
Pre-Migration
- Applications containerized and tested locally
- Storage dependencies identified and addressed
- IPv6 compatibility verified
- Health endpoints implemented
- Container images pushed to registry
During Migration
- Container groups deployed successfully
- Container Gateway configured and tested
- Health probes responding correctly
- Logs flowing to portal/external service
- Performance validated
Post-Migration
- Monitoring and alerting configured
- Cost optimization reviewed
- Team trained on new deployment process
- Documentation updated
Getting Help
SaladCloud Resources
- Documentation: docs.salad.com
- Portal: portal.salad.com
- API Reference: SaladCloud API Documentation
- Support: Contact cloud@salad.com
Migration Support
- Use SaladCloud’s web terminal for debugging
- Leverage portal logs for troubleshooting
- Configure external logging for advanced analysis
- Review health probes documentation for container lifecycle management
What You’ll Gain
Migrating to SaladCloud provides immediate benefits:- Cost Savings: Up to 90% reduction in compute costs
- Global Scale: Access to 11,000+ active GPUs across 190+ countries
- Reliability: Automatic failover and load balancing
- Simplicity: Managed container orchestration
- Flexibility: Per-second billing with no long-term commitments
Related Resources
Migration and Integration
- Other migration guides - Learn from migration patterns for transcription APIs
- Kubernetes integration - For orchestration-aware workloads
- Platform integrations - Connect with external services
Specialized Deployment Guides
- Image generation with Stable Diffusion
- Triton Inference Server for multi-model serving
- Computer vision workloads
- High-performance applications optimization guide