CountCharacters

AI Token Counter

Estimate how many tokens your prompt or content uses for GPT-4o, GPT-5, and Claude. Track usage against 128K and 200K context windows in real time.

Select content type:
GPT tokens (est.)0~0.0 chars / token
Claude tokens (est.)0~0.0 chars / token
Characters0
Words0
Context window usage
GPT-4o0 / 128,000
GPT-5 / o-series0 / 200,000
Claude Sonnet 4.60 / 200,000
Claude Sonnet 1M0 / 1,000,000
Compare token costs
Prices per 1M tokens · paste text above for cost estimate
ModelYour input
GPT-5
OpenAI
GPT-4o
OpenAI
GPT-4o mini
OpenAI
Claude Opus 4.7
Anthropic
Claude Sonnet 4.6
Anthropic
Claude Haiku 4.5
Anthropic
Gemini 2.5 Pro
Google
Gemini 2.5 Flash
Google

Prices are per million tokens at standard tier (USD). Cached input, batch, and provisioned-throughput tiers cost less. Output tokens generated by the model are billed at the higher output rate.

Estimates use the ~4 chars/token rule of thumb. Real tokenization depends on each model's BPE vocabulary; expect ±10% variance on natural English, more for code, structured data, or non-Latin scripts.

Frequently Asked Questions

What is a token?

A token is the unit of text that large language models like GPT and Claude actually process. Tokens roughly correspond to chunks of words: "counting" might be 1 token, while "tokenization" might be 2–3. Most English prose averages about 4 characters per token.

Is this an exact token count?

It's an estimate. Real tokenization depends on each model's BPE vocabulary, which would require shipping a multi-megabyte tokenizer to your browser. Our heuristic is accurate to within ~10% for natural English; expect more variance for code, JSON, or non-Latin scripts.

Why do GPT and Claude show different token counts?

They use different tokenizers trained on different corpora. Claude tends to use slightly fewer tokens for the same English text (~3.8 chars/token vs GPT's ~4), but the gap narrows for technical content.

How big is a 200K-token context window?

Roughly 150,000 English words, or about 500 single-spaced pages. Long documents and conversation history both consume this budget, so monitoring token usage matters for cost and recall.

What is BPE (Byte-Pair Encoding)?

BPE is the tokenization algorithm used by GPT models. It breaks text into subword units by iteratively merging the most frequent character pairs. For example, "tokenization" might become ["token", "ization"]. This allows models to handle rare words efficiently while keeping vocabulary size manageable.

What is a context window?

The context window is the maximum number of tokens an LLM can process in a single request (input + output combined). GPT-5.5 offers 256K tokens, Claude Opus 4.7 provides 1M tokens, and Gemini 3.1 Pro supports up to 2M tokens. Exceeding this limit will cause truncation or errors.

What is cached input pricing?

Cached input pricing offers significant discounts (up to 90% off) when you reuse the same prompt prefix across multiple API calls. This is ideal for system prompts, few-shot examples, or document analysis where the context remains constant while only the query changes.

Why are output tokens more expensive than input tokens?

Output tokens are typically 2–4x more expensive than input tokens because they require the model to perform sequential generation. To optimize costs, design prompts that get concise responses, use output length limits, and choose the right model for each task.