Token Economics

RR
Ryan Rutan

Token Economics

Token economics is the discipline of understanding and modeling the per-token costs and revenue of AI applications. Tokens (sub-word units of text) are the unit of pricing for most LLM APIs and the basis on which AI application unit economics must be modeled. The model includes input tokens (prompt + context + RAG content), output tokens (model response), cache savings, and the per-query economics that determine whether AI applications are profitable. It's the financial layer beneath every AI application.

What a token is:

Token: sub-word unit of text used by LLMs. Roughly 0.75 English words per token, or 4 characters.

Tokenization examples:

  • "The quick brown fox" → 5 tokens (one per word, roughly).
  • "Tokenization" → 3 tokens ("Token", "iz", "ation").
  • Code, math, and non-English text often produce more tokens per character.

Why tokens, not words: LLMs operate on tokens (a learned vocabulary of common sub-word units). Pricing in tokens reflects actual model compute.

The token-based pricing model:

Input tokens: charged per token of prompt sent (system prompt + user input + RAG context + conversation history).

Output tokens: charged per token of model response (typically 4-5x more expensive than input).

Both pay per million: e.g., "$3 per million input tokens, $15 per million output tokens."

The basic per-query math:

A typical query:

  • Input: 2,000 tokens (system prompt + RAG context + user question).
  • Output: 500 tokens (model response).

Cost at frontier model pricing ($5 input / $25 output per million):

  • Input cost: 2,000 × $5/M = $0.01.
  • Output cost: 500 × $25/M = $0.0125.
  • Total per query: $0.0225 (~$0.02).

At 100,000 queries per day: $2,250/day, $68K/month.

Token economics for various app types:

Chat assistant (1-2 turns):

  • Per turn: ~$0.01-$0.05.
  • High volume; sensitive to cost.

RAG application (large context):

  • Per query: ~$0.05-$0.30 (RAG context inflates input).
  • Caching helps significantly.

Agent task (multi-step):

  • Per task: ~$0.50-$5+ (multiple LLM calls, tool use).
  • Price per task or outcome accordingly.

Reasoning model query (with extended thinking):

  • Per query: ~$0.50-$5+ (long internal chain-of-thought).
  • 5-10x more expensive than non-reasoning models.

Long-context query (200K+ tokens):

  • Per query: ~$1-$10+ (input tokens accumulate fast).
  • Use sparingly or with caching.

The cost reduction levers:

Smaller model for simpler tasks: GPT-4o-mini at $0.15/M input vs GPT-5 at $5/M input.

Prompt caching: 50-90% reduction on cached portions.

Batching: 50% reduction with delayed response acceptance.

RAG with retrieval limits: don't retrieve more than needed.

Output token limits: control output length to manage cost.

Open-source self-hosted: at scale, can be 30-50% cheaper than API.

Routing: cheap model first, escalate hard cases.

The unit economics framework:

Cost per query: $0.01-$5+ depending on architecture.

Revenue per query (or task or month): must exceed cost meaningfully.

Gross margin: target 60-80% for sustainable AI app (vs 70-90% for SaaS without AI).

Volume: high-volume apps need cheap-per-query economics; low-volume can tolerate higher per-query.

The cost trajectory implication:

Inference cost dropping ~10x per 12-18 months for similar capability tiers. Implications:

Apps unprofitable today may become profitable in 12-18 months without code changes.

Don't over-engineer for current costs: capabilities you can't afford today will be cheap soon.

But still design for current economics: don't be unprofitable indefinitely betting on cost declines.

Margin expansion: if revenue stays constant, margin improves as costs drop.

The token economics tools:

Cost tracking: built-in usage dashboards from OpenAI, Anthropic, Google. Plus tools like Helicone, LangSmith for cross-provider tracking.

Cost forecasting: project costs at various scales based on token usage patterns.

Cost attribution: tracking which features/users consume what tokens.

Cache hit-rate monitoring: prompt caching effectiveness.

Ryan's Take

Token economics is the financial layer every AI founder needs to understand at a quantitative level. The discipline that works: instrument token usage from day one; model cost per query, per user, per feature; target 60-80% gross margin; use cost-reduction levers (smaller models, caching, batching, routing) intentionally; design pricing around actual unit economics. The pattern that fails: build AI features without modeling token costs; discover at scale that margins are 20%; can't fix without major architectural changes. Token economics is not optional engineering; it's product strategy.

What founders get wrong: Building AI applications without token-level cost instrumentation, then being surprised by gross margins. The right discipline: track tokens per query, per user, per feature; target gross margin in design phase; use cost-reduction techniques deliberately.

Related: Inference Cost · Large Language Model · Context Window · GPU Cost

FAQ

What are token economics?
The discipline of understanding and modeling the per-token costs and revenue of AI applications. Tokens are sub-word units (~0.75 English words each) and are the unit of pricing for most LLM APIs. Token economics is the basis for AI application unit economics.

How are LLM APIs priced?
Per million tokens, with separate prices for input (prompt + context) and output (model response). Output tokens typically 4-5x more expensive than input. Example: $3/M input, $15/M output for frontier model.

What's the cost per query for a typical AI app?
Chat assistant: $0.01-$0.05 per turn. RAG application: $0.05-$0.30 per query. Agent task: $0.50-$5+ per task. Reasoning query: $0.50-$5+. Long-context query: $1-$10+. Wide variance by architecture.

How do I optimize AI app gross margin?
Smaller models for simpler tasks. Prompt caching (50-90% reduction on cached content). Batching (50% off with delayed response). Output length controls. Open-source self-hosted at scale. Routing easy queries to cheap models. Track cost per query as a first-class metric.

Find this article helpful?

This is just a small sample! Register to unlock our in-depth courses, hundreds of video courses, and a library of playbooks and articles to grow your startup fast. Let us Let us show you!

OR

GoogleLinkedInFacebookX/Twitter

Submission confirms agreement to our Terms of Service and Privacy Policy.