OpenAI GPT OSS 20B inference overview

Price per 1M tokens

$0.05 (input)
$0.20 (output)

Parameters

3.6B (active)
20B (total)

Context Window

131K

Release Date

Aug 2025

OpenAI GPT OSS 20B inference details

OpenAI GPT OSS 20B is an open-weight 21B parameter model released by OpenAI. It uses a Mixture-of-Experts (MoE) architecture with 3.6B active parameters per forward pass, optimized for lower-latency inference. The model is trained in OpenAI’s Harmony response format and supports reasoning level configuration and structured outputs. Instantly access GPT OSS 20B at some of the industry’s lowest token costs, all running on CoreWeave’s purpose-built AI cloud. Rapidly evaluate, monitor, and iterate on your agentic AI applications using integrated W&B Weave tracing, available readily through W&B Inference.

Created by: 

OpenAI

License: 

Apache 2.0

Model card: 

				
					import openai
import weave

# Weave autopatches OpenAI to log LLM calls to W&B
weave.init("<team>/<project>")

client = openai.OpenAI(
    # The custom base URL points to W&B Inference
    base_url='https://api.inference.wandb.ai/v1',

    # Get your API key from https://wandb.ai/authorize
    # Consider setting it in the environment as OPENAI_API_KEY instead for safety
    api_key="<your-apikey>",

    # Team and project are required for usage tracking
    project="<team>/<project>",
)

response = client.chat.completions.create(
    model="openai/gpt-oss-20b",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Tell me a joke."}
    ],
)

print(response.choices[0].message.content)
				
			

OpenAI GPT OSS 20B resources

Screenshot 2025-07-30 at 1.03.14 PM
Course
AI engineering course: Agents
Inference_logo
Guide
W&B Inference powered by CoreWeave
Screenshot 2025-07-30 at 8.00.14 AM
Whitepaper
A primer on building successful AI agents