AI / LLM API
Cost Calculator
Before you ship an AI feature, know what it will cost to run. Enter your tokens and request volume to estimate the monthly bill on GPT-4o, Claude, and Gemini, compare models side by side, and see where the savings are.
How do you want to enter usage?
Not technical? Paste your text and we'll count the tokens for you.
Model
GPT-4o: $2.5/1M input · $10/1M output
What one request looks like
How much it gets used
= 2,000 AI requests per day.
Don't know your numbers?
Copy this and paste it to ChatGPT, Claude, Cursor, or any AI agent with access to your app. It'll give you the token values to plug into “Enter token counts”.
I want to estimate what my AI feature costs to run on an LLM API. Look at my app's prompts and code and give me three token numbers for a typical request: 1. System prompt tokens — the fixed instructions sent to the model on every request. 2. User input tokens — a typical user message plus any context or documents attached per request. 3. Output tokens — a typical length of the model's response. Estimate from a realistic example if you're unsure, and assume ~4 characters per token. Reply in exactly this format and nothing else: System prompt tokens: ___ User input tokens: ___ Output tokens: ___
Estimated monthly API cost
$0.00000 / mo
$0.00000 per request · $0.00000/day · $0.00000/year on GPT-4o.
Per request: 0 input + 0 output tokens.
Same workload, by model
AI bill out of control, or hallucinations reaching users?
Book a Free Call →Estimates only. Token counts from pasted text are approximate (~4 characters per token); exact counts depend on the model's tokenizer. Prices are approximate list rates and change often — confirm with your provider. Real cost also depends on caching, batching, and retries. Nothing you enter leaves your browser.
The bill that surprises founders
An AI feature that costs a fraction of a cent per request feels free in a demo. Then real usage arrives, every call carries a long system prompt and a pile of injected context, and the monthly invoice has four figures on it. The token math is small until you multiply it by thousands of requests a day.
The good news is that AI cost is one of the most controllable parts of a product. Routing routine work to a cheaper model, caching repeated responses, and trimming context usually cut the bill dramatically without users noticing any difference. This calculator shows you where your money is going so you can decide what to optimise first.
Frequently asked questions
How is LLM API cost calculated?
Providers bill per token, separately for input (your prompt plus any context) and output (the model's response). Cost per request is (input tokens ÷ 1M × input price) + (output tokens ÷ 1M × output price). Multiply by your request volume to get daily, monthly, and annual cost. This calculator does that across the major models so you can compare.
Why is my AI bill higher than expected?
Usually one of three things: bloated prompts and context (input tokens add up fast when you stuff in documents or chat history), using a top-tier model for routine work that a cheaper model handles fine, and no caching of repeated requests. Long system prompts sent on every call are a common hidden cost.
What is the cheapest way to run an AI feature?
Route simple or high-volume requests to a small, cheap model (GPT-4o mini, Claude Haiku, Gemini Flash) and reserve the expensive models for genuinely hard tasks. Cache identical or near-identical responses, trim context to what the model actually needs, and batch where latency allows. Those three moves often cut a bill by more than half.
How many tokens is a typical request?
Roughly, one token is about 0.75 words (or about 4 characters). A short chat turn might be a few hundred tokens; a retrieval-augmented request with injected documents can be several thousand input tokens. Output is whatever the model generates. Check your provider dashboard for real per-request numbers.
What if I do not know my token counts?
Use the "Paste my text" mode: drop in your system prompt, a typical user message, and a typical response, and the calculator counts the tokens for you. Or copy the ready-made prompt on the page and give it to ChatGPT, Claude, or an AI agent with access to your app — it will return the system, input, and output token values to plug in.
Are these prices exact?
They are approximate list prices for planning and change often, so confirm current rates with your provider. Real spend also depends on caching, retries, batching, and system-prompt overhead. Use this to ballpark a budget and compare models, not as a billing forecast.
Want your AI feature to be cheap and reliable?
I build and fix production AI features: cost control, model routing, caching, evals, and guardrails so the bill and the output are both under control.