MoonshotAI Kimi K2 on W&B Inference
Price per 1M tokens
$1.35 (input)
$4.00 (output)
Parameters
32B (active)
1T (total)
Context window
128K
Release date
July 2025
Kimi K2 inference details
Kimi K2 is a large-scale Mixture-of-Experts (MoE) language model developed by Moonshot AI, featuring 1 trillion total parameters with 32 billion active per forward pass. It is optimized for agentic capabilities, including advanced tool use, reasoning, and code synthesis. Kimi K2 excels across a broad range of benchmarks, particularly in coding (LiveCodeBench, SWE-bench), reasoning (ZebraLogic, GPQA), and tool-use (Tau2, AceBench) tasks. It supports long-context inference up to 128K tokens and is designed with a novel training stack that includes the MuonClip optimizer for stable large-scale MoE training.
Created by: MoonshotAI
License: Other
🤗 model card: Kimi K2 instruct
import openai
import weave
# Weave autopatches OpenAI to log LLM calls to W&B
weave.init("/")
client = openai.OpenAI(
# The custom base URL points to W&B Inference
base_url='https://api.inference.wandb.ai/v1',
# Get your API key from https://wandb.ai/authorize
# Consider setting it in the environment as OPENAI_API_KEY instead for safety
api_key="",
# Team and project are required for usage tracking
project="/",
)
response = client.chat.completions.create(
model="moonshotai/Kimi-K2-Instruct",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Tell me a joke."}
],
)
print(response.choices[0].message.content)