Why AI Products Need Usage-Based Billing
Traditional SaaS pricing works because resource consumption is relatively predictable. A project management tool uses roughly the same server resources whether a user creates 10 tasks or 100.
AI products break this assumption. Every API call to an LLM has a real, measurable cost that scales with input and output tokens. A single request can cost anywhere from $0.001 to $0.50 depending on context length and response size. A customer who sends 100 requests per day costs 100x more to serve than one who sends a single request.
This creates a fundamental mismatch with flat-rate pricing. If you charge $49/month for unlimited access, your heaviest users destroy your margins while your lightest users subsidize them. Usage-based billing solves this by aligning what customers pay with what they actually consume.
The Three Consumption Models
Metered (Overage)
Customers subscribe to a plan that includes a base allowance. Usage beyond that allowance is charged as overage at the end of the billing period.
A customer on your Pro plan gets 100,000 tokens included. During the month they use 250,000. At period end, they are charged for the 150,000 extra tokens at your overage rate.
Best for: API platforms, analytics tools, and AI assistants with steady usage patterns where customers want uninterrupted access.
Tradeoff: customers can be surprised by overage charges, and you carry credit risk during the billing period since you deliver value before collecting payment.
Credits (Prepaid Blocks)
Customers purchase blocks of credits consumed as they use your product. When credits run out, they must purchase more.
A customer buys 500,000 credits for $50. Each API call consumes credits based on complexity (100 for a simple query, 1,000 for a complex generation). At zero, the customer tops up.
Best for: image generation, code generation, and products where individual requests vary significantly in cost. The credit abstraction hides raw cost variance from customers.
Tradeoff: friction when credits run out. If a customer exhausts credits during a critical workflow, they may churn instead of topping up.
Balance (Prepaid Wallet)
Customers deposit a dollar amount into a wallet. Each usage event deducts the actual cost from the balance in real time.
Best for: developer infrastructure, API platforms, and multi-model marketplaces where customers expect direct cost visibility.
Tradeoff: the direct cost visibility can make customers overly cautious about usage, reducing engagement.
Choosing the Right Model
If your product has relatively uniform request costs (a chatbot with similar conversation lengths), metered billing is simplest. If request costs vary significantly (image generation where a simple edit costs 10x less than a full creation), credits give you pricing flexibility. If you are building developer infrastructure where customers expect transparency, balance is the right fit.
One important rule: these models are mutually exclusive within a single plan. Mixing them creates confusion. Pick one per plan and commit to it.
Pricing Strategies for AI Products
Per-Token Pricing
Charge based on input and output tokens processed. Mirrors how model providers charge you, making margin calculation straightforward. Best for API products with technical users.
Per-Request Pricing
Charge a flat amount per API call regardless of token count. Simpler for customers but requires you to absorb variance in request complexity. Works when you can normalize request sizes through product design.
Per-Minute or Per-Compute-Unit
For real-time processing (voice AI, video analysis, transcription), charging by compute time is more intuitive. Customers understand "minutes" better than "tokens."
Tiered Pricing with Usage
Combine a flat subscription tier with usage-based components. The base tier includes a generous allowance, and heavy users pay more. Predictability for customers, margin protection for you. Some teams pair this with seat-based pricing to charge per user on top of consumption.
Implementing Usage-Based Billing with Commet
Commet supports all three consumption models natively.
Define Your Plan
Create a plan in the Commet dashboard with a base price and metered features. For example, $29/month including 50,000 tokens with overage at $0.03 per 1,000 tokens.
Report Usage Events
Every time a customer makes an API call, report the usage through the SDK.
import { Commet } from "@commet/sdk";
const commet = new Commet({ apiKey: "your_api_key" });
await commet.usage.report({
customerId: "cus_abc123",
featureId: "feat_tokens",
quantity: 1500,
});Automatic Invoicing
At the end of each billing period, Commet calculates total usage, applies the included allowance, computes overage charges, and generates an invoice. For credit and balance models, deductions happen in real time.
Real-Time Visibility
Customers see their current usage and remaining allowance through the Commet Customer Portal, reducing billing surprises and support tickets.
Real-World Examples
AI writing assistant (metered): $19/month includes 100,000 tokens. Overage at $0.02/1,000 tokens. Most users stay within the allowance. Power users pay proportionally more.
Image generation platform (credits): packs of 100 credits for $10. Simple generations cost 1 credit, complex ones cost 5-10. Customers control spending precisely.
Developer API platform (balance): customers deposit funds and pay per-request at listed rates. Different models have different per-token rates, all deducted from the same balance.
Key Takeaways
Usage-based billing is not optional for AI products. The cost structure demands it. The three models (metered, credits, balance) each serve different product types and customer expectations. Choose one per plan, implement clean usage tracking, and let your billing system handle invoicing automatically. The goal is to align what customers pay with what they consume, without adding friction to the product experience.