What is a token?
A token is the unit of text that large language models like GPT and Claude actually process. Tokens roughly correspond to chunks of words: "counting" might be 1 token, while "tokenization" might be 2–3. Most English prose averages about 4 characters per token.
Is this an exact token count?
It's an estimate. Real tokenization depends on each model's BPE vocabulary, which would require shipping a multi-megabyte tokenizer to your browser. Our heuristic is accurate to within ~10% for natural English; expect more variance for code, JSON, or non-Latin scripts.
Why do GPT and Claude show different token counts?
They use different tokenizers trained on different corpora. Claude tends to use slightly fewer tokens for the same English text (~3.8 chars/token vs GPT's ~4), but the gap narrows for technical content.
How big is a 200K-token context window?
Roughly 150,000 English words, or about 500 single-spaced pages. Long documents and conversation history both consume this budget, so monitoring token usage matters for cost and recall.
What is BPE (Byte-Pair Encoding)?
BPE is the tokenization algorithm used by GPT models. It breaks text into subword units by iteratively merging the most frequent character pairs. For example, "tokenization" might become ["token", "ization"]. This allows models to handle rare words efficiently while keeping vocabulary size manageable.
What is a context window?
The context window is the maximum number of tokens an LLM can process in a single request (input + output combined). GPT-5.5 offers 256K tokens, Claude Opus 4.7 provides 1M tokens, and Gemini 3.1 Pro supports up to 2M tokens. Exceeding this limit will cause truncation or errors.
What is cached input pricing?
Cached input pricing offers significant discounts (up to 90% off) when you reuse the same prompt prefix across multiple API calls. This is ideal for system prompts, few-shot examples, or document analysis where the context remains constant while only the query changes.
Why are output tokens more expensive than input tokens?
Output tokens are typically 2–4x more expensive than input tokens because they require the model to perform sequential generation. To optimize costs, design prompts that get concise responses, use output length limits, and choose the right model for each task.