Prerequisites
Before you begin, ensure you have:- ✅ SaladCloud API Key: You’ll need a valid API key with permissions to manage container groups. See our API usage guide for instructions on obtaining your API key.
- ✅ Organization and Project: An active organization and project in SaladCloud
- ✅ Container Group: An existing container group that you want to scale based on hardware metrics
- ✅ Monitoring Infrastructure: A solution to collect and aggregate metrics (self-hosted or cloud service)
- ✅ Serverless Platform Access: Account on a platform like AWS Lambda or Cloudflare Workers for autoscaling logic
GPU Optimization: Hardware utilization autoscaling is particularly effective for GPU workloads where you want to
maintain optimal GPU utilization (typically 70-90%) while avoiding thermal throttling or memory exhaustion. For GPU
workload examples, see our AI & Machine Learning guides.
Overview
Hardware utilization autoscaling works by:- Collecting metrics from running container instances (GPU, CPU, memory usage)
- Aggregating data across all instances to determine overall utilization
- Making scaling decisions based on predefined thresholds and rules
- Calling the SaladCloud API to adjust replica counts (see managing deployments for more details)
- Managing deletion costs automatically via the SaladCloud IMDS SDK to protect busy instances during scale-down
- GPU-intensive workloads: Maintain optimal GPU utilization without overheating
- Variable compute demands: Scale based on actual resource usage patterns
- Cost optimization: Right-size your deployment based on real utilization (see billing details)
- Performance optimization: Prevent resource exhaustion and maintain quality of service
Instance Deletion Cost Management
SaladCloud provides two ways to manage instance deletion costs based on current utilization. Higher deletion costs protect busy instances during scale-down operations:Option 1: IMDS SDK (Recommended for Self-Management)
- Automatic Updates: Instances continuously update deletion costs via the SaladCloud IMDS SDK
- Local Performance: Uses local IMDS endpoint for better performance
- No Authentication: No API keys required within the instance
- Real-time: Immediate updates as utilization changes
- Error Handling: Built-in retry logic and exception handling
- Type Safety: Fully typed method calls and responses
Option 2: SaladCloud Public API (External Management)
- External Control: Autoscaler updates deletion costs via external API calls
- Centralized: Managed by the autoscaling service rather than individual instances
- API Authentication: Requires SaladCloud API key and proper permissions
- Batch Updates: Can update multiple instances in a single operation
GET /v1/deletion-cost
- Get current deletion costPUT /v1/deletion-cost
- Set deletion cost (integer)GET /v1/status
- Get instance readiness/startup statusGET /v1/token
- Get JWT for external authentication
PATCH /organizations/{org}/projects/{project}/containers/{group}/instances/{container_group_instance_id}
- Update deletion cost
Using the SaladCloud IMDS SDK
The SaladCloud IMDS SDK provides a clean, typed interface for interacting with the Instance Metadata Service. Key benefits include:- Automatic error handling: Built-in retry logic and proper exception handling
- Type safety: Fully typed responses and request parameters
- Simplified API: No need to manually handle HTTP headers or JSON parsing
- Better performance: Optimized connection pooling and timeout handling
Error Handling: The IMDS SDK automatically handles common error scenarios like network timeouts and temporary
service unavailability. Your code should still include try/catch blocks for graceful degradation, but the SDK reduces
the need for manual retry logic. For additional troubleshooting guidance, see our troubleshooting
guide.
Step 1: Implement Metrics Collection
Since SaladCloud containers can only expose one port (used by your main application), instances must push their metrics to an external metrics store rather than exposing a metrics endpoint. This section shows how to collect and push hardware metrics to external services.Metrics Pusher Script
Create a script that collects hardware metrics and pushes them to an external service. The script is modular - you can use just the components you need for your specific metrics destination.Choose Your Integration: You only need to implement the specific metrics destination functions for your use case.
For example, if you’re using CloudWatch, you only need the core collection functions and the CloudWatch integration
section.
Core Metrics Collection Functions
These functions collect hardware metrics from the instance:AWS CloudWatch Integration
For sending metrics to AWS CloudWatch:Webhook/HTTP Endpoint Integration
For sending metrics to a custom webhook or HTTP endpoint:Datadog Integration
For sending metrics to Datadog:Prometheus Integration
For sending metrics to Prometheus via a Pushgateway:Main Collection Loop
The main loop that coordinates metrics collection and pushing:Integrate with Your Application
Add the metrics pusher to your container alongside your main application:Environment Variables Configuration
Configure metrics pushing through environment variables in your SaladCloud container group:Step 2: Implement Metrics Aggregation
Next, implement a system to collect metrics from your external metrics store and make scaling decisions. This can be done using a serverless function that periodically queries the metrics store and calculates scaling actions. The autoscaler is modular - you can use just the components you need for your specific metrics destination and deployment platform.Choose Your Integration: You only need to implement the specific metrics collection functions for your use case.
For example, if you’re using CloudWatch, you only need the core autoscaler functions and the CloudWatch integration
section.
Core Autoscaler Functions
These functions provide the foundation for the autoscaling logic:AWS CloudWatch Integration
For collecting metrics from AWS CloudWatch:Webhook/Database Integration
For collecting metrics from webhook storage (DynamoDB, PostgreSQL, etc.):Prometheus Integration
For collecting metrics from Prometheus:Main Lambda Handler
The main function that coordinates metrics collection and scaling decisions:AWS Lambda Deployment
Deploy the autoscaler as an AWS Lambda function:Deployment Steps
- Create the Lambda function with Python 3.9+ runtime
- Set environment variables:
- Create EventBridge rule to trigger every 2 minutes:
Step 3: Alternative Deployment Options
Cloudflare Workers Implementation
For a serverless alternative to AWS Lambda, you can use Cloudflare Workers:Step 4: Validation
Validation Checklist
Use this checklist to verify your autoscaling implementation:-
Metrics Collection
- Metrics are being collected from all active instances
- Data is accurate and up-to-date (< 5 minutes old)
- No missing or null values in critical metrics
-
Decision Logic
- Scale-up triggers activate under high load
- Scale-down occurs during low utilization periods
- Thermal protection prevents overheating
- Instance health checks prevent scaling unhealthy nodes
-
Integration Testing
- Lambda function executes successfully
- SaladCloud API calls complete without errors
- CloudWatch metrics are published correctly
- Alerts fire for critical conditions
- Consider setting up external logging for better observability
-
Performance Validation
- Scaling actions complete within expected time frames
- No thrashing between scale-up and scale-down
- Cost optimization goals are met
- Application performance maintained during scaling
- If issues arise, consult our troubleshooting guide
Conclusion
This guide provides a complete framework for implementing hardware-metrics-based autoscaling with SaladCloud. The key benefits include:- Cost Optimization: Scale down during low utilization to reduce costs (see billing details)
- Performance Assurance: Scale up proactively to maintain application responsiveness
- Thermal Protection: Prevent GPU thermal throttling that degrades compute performance
- Operational Visibility: Comprehensive monitoring and alerting for production confidence
- Start Conservatively: Begin with broader thresholds and adjust based on observed behavior
- Monitor Continuously: Use the provided dashboards and alerts to track performance
- Test Thoroughly: Validate scaling behavior under various load patterns before production deployment
- Iterate and Improve: Refine your thresholds and logic based on operational experience