Commet
  • Pricing
Log InTry out

How to Implement Usage-Based Billing for AI Services

Direct answers to the questions AI teams ask: how to meter tokens, bill across multiple LLM providers on one invoice, sell prepaid credits, and add usage-based billing to an existing SaaS.


TL;DR

  • AI services need usage-based billing because every LLM call has a real, variable cost. Flat-rate pricing lets heavy users destroy your margin.
  • Pick one of three consumption models per plan: metered (included allowance + overage), credits (prepaid blocks), or balance (prepaid wallet). They are mutually exclusive.
  • Implementation is three steps: define a plan with metered features, report usage events from your code, and let the billing system aggregate and invoice automatically.
  • To meter tokens across multiple LLM providers, track each call with its model identifier and let a model price catalog compute the cost — everything lands on a single invoice.

How do I implement usage-based billing for AI services?

Choose a consumption model (metered, credits, or balance), create a plan with metered features and an event code for each, then report a usage event from your code every time a customer consumes the feature. Your billing platform aggregates events, applies allowances or deductions, and generates invoices automatically.

The reason AI services need this: every API call to an LLM has a real, measurable cost that scales with input and output tokens. A single request can cost anywhere from $0.001 to $0.50. A customer who sends 100 requests per day costs 100x more to serve than one who sends a single request. If you charge $49/month for unlimited access, your heaviest users destroy your margins while your lightest users subsidize them.

The three consumption models cover every AI pricing strategy:

Metered (included allowance + overage)

Customers subscribe to a plan with a base allowance. Usage beyond it is charged as overage at period end. A Pro customer gets 100,000 tokens included, uses 250,000, and pays for the extra 150,000 at your overage rate.

Best for: API platforms, analytics tools, and AI assistants with steady usage where customers want uninterrupted access. Tradeoff: overage surprises, and you carry credit risk during the period. More detail in the metered billing glossary entry.

Credits (prepaid blocks)

Customers buy blocks of credits consumed as they use the product. A simple query costs 100 credits, a complex generation 1,000. At zero, they top up.

Best for: image generation, code generation, and products where request costs vary widely — the credit abstraction hides raw cost variance. Tradeoff: friction when credits run out mid-workflow. See credits billing for how the model works.

Balance (prepaid wallet)

Customers deposit a dollar amount into a wallet. Each usage event deducts the actual cost in real time.

Best for: developer infrastructure, AI APIs, and multi-model products where customers expect direct cost visibility. Tradeoff: visible per-request costs can make customers cautious about usage.

One rule that saves you pain later: these models are mutually exclusive within a single plan. Mixing them creates confusion. Pick one per plan and commit to it.

How do I add usage-based billing to an existing SaaS that just added AI features?

Keep your existing subscription tiers and add a metered AI feature on top. Your base plan stays predictable; the AI feature gets an included allowance and an overage rate. No pricing migration, no plan rewrite — existing customers keep paying the same unless they use the new AI features heavily.

This is the most common path for SaaS products that shipped an AI assistant or generation feature after launch. The flat subscription already covers the predictable part of your costs. The AI feature is the only part with variable cost, so it is the only part that needs metering.

Concretely:

  1. Create a metered feature with an event code like ai_chat or tokens_processed.
  2. Add it to your existing plans with an included allowance sized for typical usage — for example, $29/month including 50,000 tokens with overage at $0.03 per 1,000 tokens.
  3. Track usage from the code path that calls the model.
  4. At renewal, the invoice shows the base price plus any overage as separate line items.

Generous allowances keep most customers inside the base price, so the change feels like a feature launch instead of a price increase. Heavy users — the ones actually costing you money — pay proportionally more.

What tools should I use for usage-based billing on an AI API?

You need three things: an idempotent usage-event API (retries must never double-bill), an AI model price catalog that stays current as providers change rates, and automatic invoicing that aggregates events into line items. A billing platform built for AI usage, like Commet, covers all three without you building metering infrastructure.

What to evaluate in any tool:

  • Idempotent ingestion. Networks fail and queues redeliver. Every usage event needs an idempotency key so retries are safe.
  • Model-aware pricing. Token prices change and differ per provider. Commet maintains a catalog of 180+ AI models with current input, output, and cache token prices, synced daily — you set a margin instead of hardcoding rates.
  • All three consumption models. Your first pricing model is rarely your last. Switching from metered to credits should not require a new vendor.
  • Payments, tax, and compliance handled. Commet operates as Merchant of Record: it processes card payments, handles tax calculation, collection, and remittance, charges in local currency in 20+ markets, and pays out in local currency in 112 countries. That is the part most billing stacks leave to you — the Stripe alternative comparison breaks down the difference.

Pricing for the billing layer itself matters too — check the pricing page to see what the usage-based stack costs as you scale.

How do I meter AI usage (tokens, requests, compute)?

Report a usage event every time a customer consumes the feature. For requests, track value: 1 per call. For tokens, pass the model and token counts and let the platform price them. For compute, define an event code like compute_minutes and track the quantity. Always include an idempotency key.

Per-request metering is one call:

import { Commet } from "@commet/node";

const commet = new Commet({ apiKey: process.env.COMMET_API_KEY! });

await commet.usage.track({
  customerId: "user_123",
  feature: "ai_chat",
  value: 1,
  idempotencyKey: "req_abc123",
});

For token-based billing, pass the model identifier and token counts. Commet looks up the model's current token prices, applies your margin, and deducts the cost from the customer's balance:

await commet.usage.track({
  customerId: "user_123",
  feature: "ai_chat",
  model: "gpt-4o",
  inputTokens: 1500,
  outputTokens: 300,
});

If you use the Vercel AI SDK, @commet/ai-sdk removes the manual tracking entirely. Wrap your model with tracked() and every generateText and streamText call reports tokens automatically:

import { tracked } from "@commet/ai-sdk";
import { Commet } from "@commet/node";
import { openai } from "@ai-sdk/openai";
import { generateText } from "ai";

const commet = new Commet({ apiKey: process.env.COMMET_API_KEY! });

const model = tracked(openai("gpt-4o"), {
  commet,
  feature: "ai_chat",
  customerId: "user_123",
});

const result = await generateText({ model, prompt: "Hello!" });

For real-time workloads (voice AI, video analysis, transcription), meter compute time instead — customers understand "minutes" better than "tokens". Same track() call, with a compute_minutes event code and the elapsed time as the value.

How do I bill AI API usage across multiple LLM providers on a single invoice?

Use a balance plan and track every call with its model identifier — gpt-4o, anthropic/claude-sonnet-4.6, whatever the customer used. Each provider's tokens are priced from the model catalog at that model's rates, deducted from one shared balance, and any overage lands as line items on a single invoice at period end.

This is the standard setup for AI products that route across providers — a chat product that switches between OpenAI and Anthropic, or an agent platform on the Vercel AI Gateway. The customer does not care which provider served the request; they care about one bill they can read.

Track each call with the provider-prefixed model format:

await commet.usage.track({
  customerId: "user_123",
  feature: "ai_chat",
  model: "anthropic/claude-sonnet-4.6",
  inputTokens: 10000,
  outputTokens: 2000,
  cacheReadTokens: 7000,
});

Cache read tokens are significantly cheaper than regular input tokens, and Commet prices each token type separately — so customers pay fair rates even with heavy prompt caching.

Because the model catalog already knows each model's prices, you set one margin percentage per feature instead of maintaining a rate table per provider. When a provider changes prices, the catalog updates and your margin stays intact. The dashboard's AI Costs view shows every request with its model, token breakdown, and the margin applied.

How do I let customers buy prepaid credits and deduct from a wallet balance?

These are two different consumption models — pick the one that matches how customers think about your product. Credits: customers buy credit packs and each action consumes a fixed credit amount. Balance: customers top up a dollar wallet and each event deducts the real cost. Both deduct in real time and sell through the Customer Portal.

Choose credits when request costs vary and you want to hide that variance behind a stable unit: an image generation costs 10 credits, a text generation 2, a voice synthesis 25. Subscription credits reset each billing period; credits purchased as packs never expire, which makes them safe to buy in bulk. When customers run out, they buy another pack from the Customer Portal — no support ticket, no sales call.

Choose balance when customers expect cost transparency: the plan's base price becomes a spending balance, and every event deducts the actual dollar cost. Token-based AI pricing (the model + token parameters above) works on balance plans, since the deduction is the real model cost plus your margin. Customers add funds via top-ups in the portal, and overage beyond the balance is invoiced at period end.

In both cases the deduction happens in real time, so the remaining credits or balance a customer sees in the portal is always current — which is what kills "why was I charged this?" support tickets.

Key takeaways

Usage-based billing is not optional for AI services — the cost structure demands it. Choose one consumption model per plan: metered for steady usage, credits for variable request costs, balance for cost transparency. Track every event with an idempotency key, use model-aware pricing for tokens, and let the billing platform handle aggregation and invoicing. The goal stays the same throughout: customers pay for what they consume, without friction in the product.

Developers

  • Documentation
  • Templates
  • GitHub

Frameworks

  • Next.js
  • Remix
  • Nuxt
  • SvelteKit
  • Astro
  • Express
  • Hono
  • Django
  • FastAPI

Resources

  • Blog
  • Changelog
  • Pricing

AI

  • Agents
  • MCP Server
  • Agent Skills
  • Claude Code
  • Codex
  • Cursor

Learn

  • Guides
  • Glossary
  • Solutions
  • Billing for AI Models
  • Comparison

Compare

  • Stripe alternative
  • Orb alternative
  • Recurly alternative
  • Paddle alternative
  • Chargebee alternative
  • Lago alternative

Company

  • About
  • Open Source
  • Terms
  • Privacy

Countries

  • Mexico
  • Argentina
  • Colombia
  • Chile
  • Peru
  • Ecuador
  • Uruguay
  • Paraguay
  • Bolivia
  • Panama
  • El Salvador
  • Brazil
XLinkedInGitHub