Llama models on W&B Inference
Meta Llama 4 Scout inference overview
Price per 1M tokens
$0.17 (input)
$0.66 (output)
Parameters
70B
109B
Context window
64K
Release date
Apr 2024
Meta Llama 4 Scout inference details
Llama 4 Scout integrates text and image understanding, making it suitable for multimodal applications such as visual Q&A, content moderation, captioning, and analysis tasks involving images combined with textual data. It efficiently balances computational load via a mixture-of-experts architecture.
Created by: Meta
License: other
🤗 model card: Llama-4-Scout-17B-16E-Instruct
import openai
import weave
# Weave autopatches OpenAI to log LLM calls to W&B
weave.init("/")
client = openai.OpenAI(
# The custom base URL points to W&B Inference
base_url='https://api.inference.wandb.ai/v1',
# Get your API key from https://wandb.ai/authorize
# Consider setting it in the environment as OPENAI_API_KEY instead for safety
api_key="",
# Team and project are required for usage tracking
project="/",
)
response = client.chat.completions.create(
model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Tell me a joke."}
],
)
print(response.choices[0].message.content)
Meta Llama 3.3 70B inference overview
Price per 1M tokens
$0.71 (input)
$0.71 (output)
Parameters
70B
Context window
128K
Release date
Dec 2024
Meta Llama 3.3 70B inference details
Llama 3.3 70B excels in multilingual conversational interactions, providing strong capabilities in detailed instruction-following, coding tasks, and mathematical reasoning. Ideal for general-purpose chatbots, complex text-based queries, and multilingual user interactions.
Created by: Meta
License: llama3.3
🤗 model card: Llama-3.3-70B-Instruct
import openai
import weave
# Weave autopatches OpenAI to log LLM calls to W&B
weave.init("/")
client = openai.OpenAI(
# The custom base URL points to W&B Inference
base_url='https://api.inference.wandb.ai/v1',
# Get your API key from https://wandb.ai/authorize
# Consider setting it in the environment as OPENAI_API_KEY instead for safety
api_key="",
# Team and project are required for usage tracking
project="/",
)
response = client.chat.completions.create(
model="meta-llama/Llama-3.3-70B-Instruct",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Tell me a joke."}
],
)
print(response.choices[0].message.content)
Meta Llama 3.1 8B inference overview
Price per 1M tokens
$0.22 (input)
$0.22 (output)
Parameters
8B
Context window
128K
Release date
July 2024
Meta Llama 3.1 8B inference details
Llama 3.1 8B provides efficient multilingual conversational support ideal for applications where responsiveness and computational efficiency are critical. Effective for building chatbots, automated customer interactions, and applications needing fast yet reliable language understanding.
Created by: Meta
License: llama3.1
🤗 model card: Llama-3.1-8B-Instruct
import openai
import weave
# Weave autopatches OpenAI to log LLM calls to W&B
weave.init("/")
client = openai.OpenAI(
# The custom base URL points to W&B Inference
base_url='https://api.inference.wandb.ai/v1',
# Get your API key from https://wandb.ai/authorize
# Consider setting it in the environment as OPENAI_API_KEY instead for safety
api_key="",
# Team and project are required for usage tracking
project="/",
)
response = client.chat.completions.create(
model="meta-llama/Llama-3.1-8B-Instruct",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Tell me a joke."}
],
)
print(response.choices[0].message.content)