Post-train AI agents using supervised fine-tuning (SFT) and reinforcement learning (RL) without managing infrastructure
With W&B Training, you pay only for active usage, not idle time. Pricing has three components: inference, training, and storage.
Inference
When your agent explores the environment during RL, it runs inference to generate trajectories that are later used in training. Billing for inference is based on the total input and output tokens used to generate each trajectory. Learn about credits, account tiers, and usage caps.
Training
At each step of Serverless SFT and Serverless RL, we run a distributed training job on a separate GPU cluster to update a Low Rank Adapter (LoRA). Training is free during the public preview; pricing will be announced at general availability (GA).
Model Storage
W&B Training stores checkpoints of your trained LoRA so you can evaluate, serve, or continue training them at any time. Storage is billed monthly based on total checkpoint size and your pricing plan. Every plan includes free storage. For details, see subscription plans.
GPU-hours are calculated by aggregating the total time used to train your models during the last billing cycle. Training a single step requires GPU time for three actions: downloading the most recent LoRA to train from, adjusting the LoRA weights using GRPO, and saving the updated weights. Since the downloading and saving processes only take a few seconds each, the bulk of a training step is dedicated to actually training your model.
No, jobs are billed for the GPU time they use, with no minimum training duration.
GPU time for failed jobs will not be charged to the user’s account.
A token is a mathematical representation of natural language. Log in to your account to view your billing dashboard. This dashboard will show you how many tokens you’ve used during the current and past months.