Llama models on W&B Inference

Meta Llama 4 Scout inference overview

Price per 1M tokens

$0.17 (input)
$0.66 (output)

Parameters

70B
109B

Context window

64K

Release date

Apr 2024

Meta Llama 4 Scout inference details

Llama 4 Scout integrates text and image understanding, making it suitable for multimodal applications such as visual Q&A, content moderation, captioning, and analysis tasks involving images combined with textual data. It efficiently balances computational load via a mixture-of-experts architecture.

Created by: Meta

License: other

🤗 model card: Llama-4-Scout-17B-16E-Instruct

				
					import openai
import weave

# Weave autopatches OpenAI to log LLM calls to W&B
weave.init("<team>/<project>")

client = openai.OpenAI(
    # The custom base URL points to W&B Inference
    base_url='https://api.inference.wandb.ai/v1',

    # Get your API key from https://wandb.ai/authorize
    # Consider setting it in the environment as OPENAI_API_KEY instead for safety
    api_key="<your-apikey>",

    # Team and project are required for usage tracking
    project="<team>/<project>",
)

response = client.chat.completions.create(
    model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Tell me a joke."}
    ],
)

print(response.choices[0].message.content)

Meta Llama 3.3 70B inference overview

Price per 1M tokens

$0.71 (input)
$0.71 (output)

Parameters

70B

Context window

128K

Release date

Dec 2024

Meta Llama 3.3 70B inference details

Llama 3.3 70B excels in multilingual conversational interactions, providing strong capabilities in detailed instruction-following, coding tasks, and mathematical reasoning. Ideal for general-purpose chatbots, complex text-based queries, and multilingual user interactions.

Created by: Meta

License: llama3.3

🤗 model card: Llama-3.3-70B-Instruct

				
					import openai
import weave

# Weave autopatches OpenAI to log LLM calls to W&B
weave.init("<team>/<project>")

client = openai.OpenAI(
    # The custom base URL points to W&B Inference
    base_url='https://api.inference.wandb.ai/v1',

    # Get your API key from https://wandb.ai/authorize
    # Consider setting it in the environment as OPENAI_API_KEY instead for safety
    api_key="<your-apikey>",

    # Team and project are required for usage tracking
    project="<team>/<project>",
)

response = client.chat.completions.create(
    model="meta-llama/Llama-3.3-70B-Instruct",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Tell me a joke."}
    ],
)

print(response.choices[0].message.content)

Meta Llama 3.1 8B inference overview

Price per 1M tokens

$0.22 (input)
$0.22 (output)

Parameters

Context window

128K

Release date

July 2024

Meta Llama 3.1 8B inference details

Llama 3.1 8B provides efficient multilingual conversational support ideal for applications where responsiveness and computational efficiency are critical. Effective for building chatbots, automated customer interactions, and applications needing fast yet reliable language understanding.

Created by: Meta

License: llama3.1

🤗 model card: Llama-3.1-8B-Instruct

				
					import openai
import weave

# Weave autopatches OpenAI to log LLM calls to W&B
weave.init("<team>/<project>")

client = openai.OpenAI(
    # The custom base URL points to W&B Inference
    base_url='https://api.inference.wandb.ai/v1',

    # Get your API key from https://wandb.ai/authorize
    # Consider setting it in the environment as OPENAI_API_KEY instead for safety
    api_key="<your-apikey>",

    # Team and project are required for usage tracking
    project="<team>/<project>",
)

response = client.chat.completions.create(
    model="meta-llama/Llama-3.1-8B-Instruct",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Tell me a joke."}
    ],
)

print(response.choices[0].message.content)

Llama models on W&B Inference

Meta Llama 4 Scout inference overview

Price per 1M tokens

Parameters

Context window

Release date

Meta Llama 4 Scout inference details

Meta Llama 3.3 70B inference overview

Price per 1M tokens

Parameters

Context window

Release date

Meta Llama 3.3 70B inference details

Meta Llama 3.1 8B inference overview

Price per 1M tokens

Parameters

Context window

Release date

Meta Llama 3.1 8B inference details

Meta Llama resources

A primer on building successful AI agents

W&B Inference powered by CoreWeave guide

AI engineering course: Agents

The Platform

Article

Resources

Company

Use cases

Industries

Learn more

Llama models on W&B Inference

Meta Llama 4 Scout inference overview

Price per 1M tokens

Parameters

Context window

Release date

Meta Llama 4 Scout inference details

Meta Llama 3.3 70B inference overview

Price per 1M tokens

Parameters

Context window

Release date

Meta Llama 3.3 70B inference details

Meta Llama 3.1 8B inference overview

Price per 1M tokens

Parameters

Context window

Release date

Meta Llama 3.1 8B inference details

Meta Llama resources

A primer on building successful AI agents

W&B Inference powered by CoreWeave guide

AI engineering course: Agents

The Platform

Article

Resources

Company

Use cases

Industries