Prerequisites
Before you begin, ensure you have:- ✅ SaladCloud API Key: You’ll need a valid API key to make requests
- ✅ Organization and Project: An active organization and project in SaladCloud
- ✅ Container Image with Job Queue Worker: A containerized application that includes the SaladCloud Job Queue Worker and exposes an HTTP endpoint for job processing
- ✅ Job Queue Understanding: Familiarity with SaladCloud Job Queues
Important: Your container must include both your application AND the SaladCloud Job Queue Worker binary. The
worker handles communication with the queue service and forwards jobs to your application via HTTP requests. See the
Job Queue Worker guide for setup instructions.
Overview
Queue-based autoscaling works by monitoring the number of pending jobs in your queue and automatically adjusting the number of container instances accordingly. The system uses a simple formula:Step 1: Plan Your Autoscaling Configuration
Before implementing autoscaling, decide on these key parameters:Scaling Boundaries
- Minimum Replicas: The lowest number of instances (0 for cost optimization, or higher for faster response times)
- Maximum Replicas: The upper limit based on your workload capacity and budget
Queue Management
- Desired Queue Length: How many jobs each instance should handle (typically 1-10 depending on job complexity)
- Polling Period: How often to check the queue (15-300 seconds, shorter for more responsive scaling)
Rate Limiting
- Max Upscale/Downscale per Minute: Prevents rapid scaling fluctuations
Step 2: Create a Job Queue
First, create a job queue for your project:Step 3: Create Container Group with Autoscaling
Now create a container group with autoscaling enabled. Your container image must include both your application and the Job Queue Worker. Here’s a complete example:queue_connection.path
: HTTP endpoint where your app receives jobs (e.g.,/process
)queue_connection.port
: Port your application listens on (must match networking.port)readiness_probe
: Ensures your app is ready before the worker starts sending jobsSALAD_QUEUE_WORKER_LOG_LEVEL
: Controls worker logging (error, warn, info, debug)
Step 4: Configure Autoscaling Parameters
Let’s break down each autoscaling parameter:Core Settings
Rate Limiting (Optional)
Step 5: Verify Your Setup
After creating your container group, verify the configuration:- ✅
queue_autoscaler
section is present with your configured values - ✅
queue_connection
is properly configured - ✅ Container group is in
running
state
Step 6: Test Autoscaling Behavior
Test Scale Up
Add jobs to your queue and observe scaling:Test Scale Down
Once jobs are processed, the system should automatically scale down based on your configuration.Common Configuration Patterns
Cost-Optimized (Scale to Zero)
- Use case: Batch processing where response time isn’t critical
- Benefits: Zero cost when idle
- Trade-off: Cold start delays when new jobs arrive
Performance-Optimized (Always Ready)
- Use case: Real-time processing requiring low latency
- Benefits: Immediate job processing
- Trade-off: Higher baseline costs
Balanced Approach
- Use case: Steady workload with occasional spikes
- Benefits: Good balance of cost and performance
Best Practices
Container Application Design
Your container must include both your application and the SaladCloud Job Queue Worker. The worker handles queue communication while your application processes jobs via HTTP requests.Key Architecture Requirements:
- Job Queue Worker: Include the precompiled Go binary in your container
- HTTP Server: Your application must expose an HTTP endpoint to receive jobs
- Process Management: Use s6-overlay or shell scripts to run both processes
- Health Checks: Implement readiness probes to ensure your app is ready before receiving jobs
Example Container Setup
Application Code (FastAPI example):Container Group Configuration
When creating your container group, ensure thequeue_connection
matches your application:
How It Works:
- Job Queue Worker connects to SaladCloud queue service
- Worker receives jobs and forwards them as HTTP POST requests to your app
- Your Application processes the job and returns HTTP response
- Worker interprets HTTP status codes:
200
: Job successful500
: Job failed (will retry up to 3 times, meaning 4 total attempts)
Best Practices:
- Use Readiness Probes: Ensure your app is ready before the worker starts sending jobs
- Handle Graceful Shutdown: Complete current HTTP requests before termination
- Return Proper HTTP Status Codes: Use
200
for successful jobs,500
for failed jobs that should retry - Set Appropriate Timeouts: Configure request timeouts based on job complexity
- Implement Health Checks: Provide
/ready
endpoint for container monitoring
Monitoring and Optimization
-
Monitor Key Metrics:
- Queue length over time
- Instance count fluctuations
- Job processing rate
- Cost per job processed
-
Tune Parameters Based on Workload:
- Increase
desired_queue_length
for longer-running jobs - Decrease
polling_period
for more responsive scaling - Adjust rate limits based on scaling patterns
- Increase
Cost Management
- Set Appropriate Quotas: Configure
max_replicas
based on your budget - Use Instance Deletion Cost: Set deletion costs to prioritize which instances to terminate
- Monitor Scaling Events: Track when and why scaling occurs to optimize settings
Troubleshooting
Autoscaling Not Working
- Check Queue Connection: Ensure
queue_connection
path and port match your application’s HTTP endpoint - Verify Job Queue Worker: Confirm the worker binary is included and running in your container
- Review Container Logs: Check if the worker is connecting to the queue and your app is receiving requests
- Test Readiness Probe: Ensure your app responds to health checks before jobs are sent
Jobs Failing or Timing Out
- Check HTTP Response Codes: Ensure your app returns
200
for successful jobs,500
for failed jobs that should retry - Verify Endpoint Path: Confirm
queue_connection.path
matches your application’s job processing endpoint - Review Application Logs: Check for errors in job processing logic
- Adjust Timeouts: Optimize job processing time or configure appropriate timeouts
Excessive Scaling
- Increase Polling Period: Reduce frequency of scaling decisions
- Set Rate Limits: Use
max_upscale_per_minute
andmax_downscale_per_minute
- Adjust Desired Queue Length: Increase to reduce sensitivity to queue fluctuations
- Monitor Job Processing Rate: Ensure jobs are completing successfully to clear the queue
Slow Response to Load
- Decrease Polling Period: Check queue more frequently
- Increase Rate Limits: Allow faster scaling when needed
- Optimize Container Startup: Reduce image size and startup time
- Use Readiness Probes: Prevent jobs from being sent to containers that aren’t ready
Worker Connection Issues
- Check Worker Logs: Set
SALAD_QUEUE_WORKER_LOG_LEVEL=debug
for detailed logging - Verify Port Configuration: Ensure networking.port matches queue_connection.port
- Test Application Endpoint: Manually test your app’s HTTP endpoint for job processing
- Review Process Management: Ensure both worker and application processes are running
Alternative Autoscaling Approaches
While SaladCloud’s built-in Job Queue autoscaling provides the most integrated experience, you can also implement custom autoscaling solutions with external queue systems:- AWS SQS Autoscaling: Build custom autoscaling with AWS SQS using Lambda functions to monitor queue depth and scale container groups
- RabbitMQ Autoscaling: Implement autoscaling with RabbitMQ using Cloudflare Workers to monitor queue length and adjust replicas
- Kelpie Autoscaling Service: Use Kelpie’s external autoscaling service that automatically scales SaladCloud container groups based on RabbitMQ queue depth
Next Steps
- 📖 Learn more about Autoscaling Settings
- 🔧 Explore Job Queue Management
- 📊 Set up Instance Deletion Cost for smarter scaling
- 🔗 Review the complete SaladCloud API Reference