Deploy from the SaladCloud Portal.
Overview
This guide covers deploying GPT-OSS-120B with vLLM on SaladCloud Secure (DataCenter GPUs). GPT-OSS-120B is OpenAI’s largest open-weight model (Apache 2.0 license), designed for advanced reasoning, general-purpose tasks, and large-scale experimentation. It is distributed via Hugging Face without gating, so you can deploy it directly without requiring a token.Key Features
- Permissive Apache 2.0 license — Build freely without copyleft restrictions or patent risk—ideal for experimentation, customization, and commercial deployment.
- Configurable reasoning effort — Easily adjust the reasoning effort (
low
,medium
,high
) based on your specific use case and latency needs. - Full chain-of-thought — Gain complete access to the model’s reasoning process, facilitating easier debugging and increased trust in outputs.
- Fine-tunable — Fully customize the model to your specific use case through parameter fine-tuning.
- Agentic capabilities — Use the model’s native capabilities for function calling, web browsing, Python code execution, and structured outputs.
Configuration
This recipe comes pre-configured for GPT-OSS-120B. You don’t need to provide model IDs or parallelism settings. Only one option is exposed:- GPU Memory Utilization — Fraction of GPU VRAM vLLM may use (default:
0.95
). Lower this if you want extra memory headroom for monitoring or sidecar processes.
Example Request
Submit chat completion requests to the/v1/chat/completions
endpoint, and receive generated text in response.
How To Use This Recipe
Authentication
If authentication is enabled, requests must include your SaladCloud API key in theSalad-Api-Key
header. See
Sending Requests.