Question 1

What is a token?

Accepted Answer

A token is the unit of text that large language models like GPT and Claude actually process. Tokens roughly correspond to chunks of words: "counting" might be 1 token, while "tokenization" might be 2–3. Most English prose averages about 4 characters per token.

Question 2

Is this an exact token count?

Accepted Answer

It's an estimate. Real tokenization depends on each model's BPE vocabulary, which would require shipping a multi-megabyte tokenizer to your browser. Our heuristic is accurate to within ~10% for natural English; expect more variance for code, JSON, or non-Latin scripts.

Question 3

Why do GPT and Claude show different token counts?

Accepted Answer

They use different tokenizers trained on different corpora. Claude tends to use slightly fewer tokens for the same English text (~3.8 chars/token vs GPT's ~4), but the gap narrows for technical content.

Question 4

How big is a 200K-token context window?

Accepted Answer

Roughly 150,000 English words, or about 500 single-spaced pages. Long documents and conversation history both consume this budget, so monitoring token usage matters for cost and recall.

Question 5

What is BPE (Byte-Pair Encoding)?

Accepted Answer

BPE is the tokenization algorithm used by GPT models. It breaks text into subword units by iteratively merging the most frequent character pairs. For example, "tokenization" might become ["token", "ization"]. This allows models to handle rare words efficiently while keeping vocabulary size manageable.

Question 6

What is a context window?

Accepted Answer

The context window is the maximum number of tokens an LLM can process in a single request (input + output combined). GPT-5.5 offers 256K tokens, Claude Opus 4.7 provides 1M tokens, and Gemini 3.1 Pro supports up to 2M tokens. Exceeding this limit will cause truncation or errors.

Question 7

What is cached input pricing?

Accepted Answer

Cached input pricing offers significant discounts (up to 90% off) when you reuse the same prompt prefix across multiple API calls. This is ideal for system prompts, few-shot examples, or document analysis where the context remains constant while only the query changes.

Question 8

Why are output tokens more expensive than input tokens?

Accepted Answer

Output tokens are typically 2–4x more expensive than input tokens because they require the model to perform sequential generation. To optimize costs, design prompts that get concise responses, use output length limits, and choose the right model for each task.

Model	Input / 1M	Output / 1M	Context	Your input
GPT-5 OpenAI	$2.50	$10.00	400K	—
GPT-4o OpenAI	$2.50	$10.00	128K	—
GPT-4o mini OpenAI	$0.15	$0.60	128K	—
Claude Opus 4.7 Anthropic	$15.00	$75.00	200K	—
Claude Sonnet 4.6 Anthropic	$3.00	$15.00	1M	—
Claude Haiku 4.5 Anthropic	$1.00	$5.00	200K	—
Gemini 2.5 Pro Google	$1.25	$10.00	2M	—
Gemini 2.5 Flash Google	$0.30	$2.50	1M	—

AI Token Counter

Frequently Asked Questions