Post-train AI agents using reinforcement learning (RL) without managing infrastructure
Serverless RL splits the rollout inference and distributed training workloads of the RL loop and runs them on separate CoreWeave GPU clusters to maximize utilization and minimize cost. You pay only for active usage, not idle time. Pricing has three components: inference, training, and storage.
Inference
When your agent explores the environment, it runs inference to generate trajectories that are later used in training. Billing for inference is based on the total input and output tokens used to generate each trajectory. Learn about credits, account tiers, and usage caps.
Training
At each training step, Serverless RL groups trajectories with their rewards to update a training copy of the LoRA, running the update as distributed training on a separate GPU cluster. Training is free during the public preview; pricing will be announced at general availability (GA).
Models Storage
Serverless RL stores checkpoints of your trained LoRA heads so you can evaluate, serve, or continue training them at any time. Storage is billed monthly based on total checkpoint size and your pricing plan. Every plan includes free storage. For details, see subscription plans.
GPU-hours are calculated by aggregating the total time used to train your models during the last billing cycle. Training a single step requires GPU time for three actions: downloading the most recent LoRA to train from, adjusting the LoRA weights using GRPO, and saving the updated weights. Since the downloading and saving processes only take a few seconds each, the bulk of a training step is dedicated to actually training your model.
No, jobs are billed for the GPU time they use, with no minimum training duration.
GPU time for failed jobs will not be charged to the user’s account.
A token is a mathematical representation of natural language. Log in to your account to view your billing dashboard. This dashboard will show you how many tokens you’ve used during the current and past months.