How to implement usage-based billing for AI and LLM products. Charge per token, per API call, or per generation with automatic metering.

AI products have a billing problem that traditional SaaS does not: the cost of every request is variable and unpredictable.

A user might send a 50-token prompt or a 50,000-token prompt. An image generation might take 2 seconds or 30. Charging a flat monthly rate means you either lose money on heavy users or price out light ones. Neither outcome works at scale.

This is why usage-based billing is not optional for AI products. It is the only model that aligns your revenue with your costs.

Why flat pricing breaks for AI

Traditional SaaS has relatively fixed infrastructure costs per user. AI does not. Every inference has a direct compute cost that varies based on model, input size, output length, and processing time. A user who sends 100 requests per month and a user who sends 100,000 are not comparable from a cost perspective, but on a flat plan they pay the same amount.

The result is predictable: your heaviest users generate losses, your lightest users feel overcharged, and your margins are a mystery. Usage-based billing solves this by tying revenue directly to consumption.

Three consumption models

Commet supports three consumption models. Each AI product fits one of them, and they are mutually exclusive per plan.

Metered (overage)

The user gets a plan with included usage. Anything beyond the included amount is billed as overage at the end of the billing period.

Example: a Pro plan includes 100,000 tokens per month. Every additional token costs $0.002. The overage is calculated and charged automatically when the period closes.

This model works well when you want predictable base revenue with elastic upside. Users know their baseline cost and only pay more when they use more.

Credits

The user purchases blocks of credits upfront. Each action consumes credits, and when the balance hits zero, access stops until they buy more.

Example: 500 credits buys 500 image generations. When credits run out, the user purchases another block to continue.

Credits work well for products where users expect to prepay, such as image generators or batch processing tools. The model is simple to understand and gives users direct control over their spending.

Balance

The user loads a dollar balance and spends it across any feature in the product. Different actions consume different amounts from the same balance.

Example: a user adds $50 to their account. A GPT-4 call costs $0.03, a GPT-3.5 call costs $0.003, and an embedding costs $0.0001. The balance decreases with each action.

Balance is the right model when your product offers multiple AI capabilities with different cost profiles and you want a unified spending mechanism.

Choosing the right model

The decision comes down to how your users think about spending:

Metered if your users expect a subscription with predictable costs and occasional overage.
Credits if your users prefer to prepay in discrete blocks and control exactly how much they spend.
Balance if your product offers multiple features at different price points and users want a single wallet.

Pick one model per plan. Mixing models within a plan creates confusion for both you and your customers.

Implementation with Commet

Commet treats consumption as a first-class concept. You report usage events through the SDK and Commet handles accumulation, enforcement, and invoicing.

await commet.usage.report({
  customerId: "cus_abc123",
  featureSlug: "api-calls",
  amount: 1,
});

Before executing an expensive inference, your app checks whether the user has access:

const access = await commet.entitlements.check({
  customerId: "cus_abc123",
  featureSlug: "api-calls",
});

if (!access.allowed) {
  // Do not run the model
}

No webhooks, no reconciliation jobs, no usage tables in your database. Commet is the single source of truth for who can do what and how much they have used.

The rate scale (10,000 = $1.00) supports sub-cent pricing, which is essential for AI products where individual operations cost fractions of a penny.

Pricing

Commet operates as Merchant of Record. The fee is 4.5% + $0.40 per transaction. This includes Stripe processing, tax collection, compliance, and invoicing. There is no additional payment processing fee.

AI products have a billing problem that traditional SaaS does not: the cost of every request is variable and unpredictable.

This is why usage-based billing is not optional for AI products. It is the only model that aligns your revenue with your costs.

Why flat pricing breaks for AI

Three consumption models

Commet supports three consumption models. Each AI product fits one of them, and they are mutually exclusive per plan.

Metered (overage)

The user gets a plan with included usage. Anything beyond the included amount is billed as overage at the end of the billing period.

Example: a Pro plan includes 100,000 tokens per month. Every additional token costs $0.002. The overage is calculated and charged automatically when the period closes.

This model works well when you want predictable base revenue with elastic upside. Users know their baseline cost and only pay more when they use more.

Credits

The user purchases blocks of credits upfront. Each action consumes credits, and when the balance hits zero, access stops until they buy more.

Example: 500 credits buys 500 image generations. When credits run out, the user purchases another block to continue.

Credits work well for products where users expect to prepay, such as image generators or batch processing tools. The model is simple to understand and gives users direct control over their spending.

Balance

The user loads a dollar balance and spends it across any feature in the product. Different actions consume different amounts from the same balance.

Example: a user adds $50 to their account. A GPT-4 call costs $0.03, a GPT-3.5 call costs $0.003, and an embedding costs $0.0001. The balance decreases with each action.

Balance is the right model when your product offers multiple AI capabilities with different cost profiles and you want a unified spending mechanism.

Choosing the right model

The decision comes down to how your users think about spending:

Metered if your users expect a subscription with predictable costs and occasional overage.
Credits if your users prefer to prepay in discrete blocks and control exactly how much they spend.
Balance if your product offers multiple features at different price points and users want a single wallet.

Pick one model per plan. Mixing models within a plan creates confusion for both you and your customers.

Implementation with Commet

Commet treats consumption as a first-class concept. You report usage events through the SDK and Commet handles accumulation, enforcement, and invoicing.

await commet.usage.report({
  customerId: "cus_abc123",
  featureSlug: "api-calls",
  amount: 1,
});

Before executing an expensive inference, your app checks whether the user has access:

const access = await commet.entitlements.check({
  customerId: "cus_abc123",
  featureSlug: "api-calls",
});

if (!access.allowed) {
  // Do not run the model
}

No webhooks, no reconciliation jobs, no usage tables in your database. Commet is the single source of truth for who can do what and how much they have used.

The rate scale (10,000 = $1.00) supports sub-cent pricing, which is essential for AI products where individual operations cost fractions of a penny.

Billing for AI Products

Why flat pricing breaks for AI

Three consumption models

Metered (overage)

Credits

Balance

Choosing the right model

Implementation with Commet

Pricing

Billing for AI Products

Why flat pricing breaks for AI

Three consumption models

Metered (overage)

Credits

Balance

Choosing the right model

Implementation with Commet

Pricing