# What Is Tokenisation? The Invisible Mechanism Behind AI Costs, Quality, and Bias

Tokenisation drives 65% of unexpected AI costs. Learn how tokens work, why non-English content costs 2-7x more, and how to cut API spend by 30-50%.

**Published:** April 24, 2026
**Author:** Evan Ramzipoor

---

Two out of three IT leaders have been hit with unexpected AI charges. The culprit isn't bad software or hidden fees. It's tokenisation.

If you use AI tools or APIs, you're already paying for tokens whether you realize it or not.

  
  
  
  
</KeyTakeaways>

The Zylo AI Cost Report (2026) found that 65% of IT leaders report unexpected AI costs, with budgets running 30-50% over. The unit behind those charges is the token: the smallest piece of text an AI model reads, writes, or bills for. Tokenisation is the process by which text is broken into those pieces. It determines what you pay per API call, how much context your model can handle, and why ChatGPT can't count the R's in "strawberry."

## The Revenue Case: Why Tokenisation Is a Budget Problem

Tokenisation drives AI costs. Every prompt you send and every response you receive is billed per token. Output tokens (the text the model generates) cost 3-10x more than input tokens, which is why long responses drive surprise bills.

<div className="not-prose my-8">
  
</div>

Token prices have dropped 90-99% since GPT-4 launched, but costs still fluctuate based on language, content type, and output length. Teams that route simple tasks to smaller models cut spend by up to 75%. None of those savings are possible without understanding what a token actually is.

## What Is Tokenisation? The 60-Second Explainer

Tokenisation is the process by which AI breaks text into smaller pieces called tokens. A token can be a word, part of a word, or a character. Models only process tokens; they never see your original text as written.

Think of tokenisation like breaking a sentence into LEGO blocks. Some blocks are full words ("the," "cat"), while others are parts of words ("un" + "happi" + "ness"). OpenAI estimates one token is about 4 characters, or 0.75 English words. A 1,000-word English article is roughly 1,300 tokens.

<div className="not-prose">
  
</div>

## How Tokenisation Works: From Text to Numbers

<div className="not-prose">
  
</div>

### Tokenisation Methods

Modern AI models use one of three tokenisation approaches, each with different tradeoffs for cost and language coverage.

<div className="not-prose">
  
</div>

### The "Strawberry" Problem

Tokenisation helps explain many AI errors. Ask ChatGPT "how many R's are in strawberry?" and it'll probably get it wrong. The model doesn't see letters individually; it sees two tokens: "straw" and "berry." The R's are buried inside those chunks, so the model can't reliably count them. As AI researcher Andrej Karpathy noted, tokenisation is "probably the worst part of working with LLMs."

<div className="not-prose">
  
</div>

## Why Tokenisation Matters for AI Performance

Tokenisation shapes three important aspects of AI systems: what you pay, how well the model performs, and how fairly it treats different languages. For some languages, poor tokenisation can raise costs by 2-5x and cut accuracy by up to 18%.

### Maths, Reasoning, and the Token Gap

Tokenisation affects numbers too. When a model sees "1234," it might split that into "123" and "4," making basic arithmetic unreliable. Singh and Strouse ("Tokenisation Counts," arXiv, 2024) found that flipping token direction (right to left) boosted maths accuracy by more than 22%.

### The Token Tax: Language Bias Built Into AI

Research by Lundin et al. ("The Token Tax," arXiv, 2025) found that African languages need 2-5x more tokens than English to process the same content. In one example, a Telugu translation had fewer characters than the English version but needed 7x more tokens.

<div className="not-prose my-6">
  
</div>

### Context Rot: When More Tokens Make Things Worse

While large context windows might seem beneficial, Chroma Research (2025) tested 18 LLMs and found that performance drops as context grows, especially for information in the middle of a prompt. This is known as the "lost in the middle" effect. More tokens don't always improve results; they can do the opposite.

## Token-Based Pricing: What You're Actually Paying For

<div className="not-prose my-6">
  
</div>

As of early 2026, input tokens range from $0.50 to $3 per million, while output tokens cost $3 to $15 per million. To estimate your own costs, use OpenAI's tokeniser tool or this rule of thumb: multiply your English word count by 1.3. If you're managing AI spend, track the input-to-output token ratio. A ratio above 1:3 is a cost signal worth investigating.

## Practical Implications for Content and Marketing Teams

<div className="not-prose">
  
</div>

## The Future of Tokenisation

Tokenisation may not last forever. Meta AI Research (December 2024) published the Byte Latent Transformer (BLT), which processes raw bytes and matches LLaMA 3's performance with up to 50% fewer compute operations. HKU NLP's EvaByte (2025) is the first open-source byte-level model that matches tokenised model performance.

If these approaches mature, the tokenisation pipeline could disappear. For now, AI runs on tokens, prices in tokens, and limits in tokens.

## Frequently Asked Questions

<div className="not-prose">
  
</div>

## Wrapping Up

Tokenisation determines how much you pay, how much text a model can process, and how accurately it handles different languages. Understanding the mechanism turns unpredictable AI bills into something you can control.

<div className="not-prose">
  
</div>

<div className="not-prose">
  
</div>

---

[Back to Blog](https://www.searchable.com/blog) | [Searchable Homepage](https://www.searchable.com)
