Serverless RL Pricing

Post-train AI agents using reinforcement learning (RL) without managing infrastructure

Serverless RL splits the rollout inference and distributed training workloads of the RL loop and runs them on separate CoreWeave GPU clusters to maximize utilization and minimize cost. You pay only for active usage, not idle time. Pricing has three components: inference, training, and storage.

Inference

When your agent explores the environment, it runs inference to generate trajectories that are later used in training. Billing for inference is based on the total input and output tokens used to generate each trajectory. Learn about credits, account tiers, and usage caps.

Training

At each training step, Serverless RL groups trajectories with their rewards to update a training copy of the LoRA, running the update as distributed training on a separate GPU cluster. Training is free during the public preview; pricing will be announced at general availability (GA).

Models Storage

Serverless RL stores checkpoints of your trained LoRA heads so you can evaluate, serve, or continue training them at any time. Storage is billed monthly based on total checkpoint size and your pricing plan. Every plan includes free storage. For details, see subscription plans.

Inference pricing (pricing per 1M tokens)

Model

Input Tokens

Output Tokens

Qwen 2.5 14B
$0.06
$0.24

Training pricing

Serverless RL is currently in public preview, and training is free during this period. We’ll announce pricing at general availability.

Models Storage pricing

Storage costs are calculated monthly based on the total size of your artifacts and checkpoints according to your pricing plan. All plans include free storage.

Frequently asked questions

How are GPU-hours for training calculated?

GPU-hours are calculated by aggregating the total time used to train your models during the last billing cycle. Training a single step requires GPU time for three actions: downloading the most recent LoRA to train from, adjusting the LoRA weights using GRPO, and saving the updated weights. Since the downloading and saving processes only take a few seconds each, the bulk of a training step is dedicated to actually training your model.

Is there a minimum charge for training?

No, jobs are billed for the GPU time they use, with no minimum training duration.

What if the job fails? Do failed jobs get billed partially, fully, or not at all?

GPU time for failed jobs will not be charged to the user’s account.

How will I know how many tokens I’ve used each month?

A token is a mathematical representation of natural language. Log in to your account to view your billing dashboard⁠. This dashboard will show you how many tokens you’ve used during the current and past months.