What Are Tokens
When you type a message into ChatGPT, Copilot, or Gemini, the AI does not read your words the way you do. Before it can process a single sentence, it breaks your text into smaller pieces called tokens. This is not a minor technical detail. Tokens are the fundamental unit of how AI processes, understands, and generates language. They determine what the model can see, how much it can remember, and even how much it costs to run.
In How Large Language Models Are Trained, we mentioned that modern training datasets contain "trillions of tokens" without explaining what that actually means. This article fills that gap. Understanding tokens gives you a practical mental model for why AI tools behave the way they do: why they forget things, why they have usage limits, and why the way you write your prompts matters.
What Is a Token?
A token is a small unit of data that an AI model uses to process text. Tokens can be whole words, parts of words, or even individual characters. Think of it like eating an orange. You do not eat it whole. You break it into sections first, and then eat each piece one at a time. Tokenization does the same thing: it splits your input into manageable pieces the model can work with.
Simple words often become a single token. The sentence "I like coffee" becomes three tokens: "I", "like", and "coffee." But longer or less common words get split into smaller pieces. The word "darkness" becomes two tokens: "dark" and "ness." The word "brightness" also ends in "ness," and because the model has seen that suffix thousands of times across different words, it understands the relationship between them. Both words share a token that signals a quality or state.
This is the key insight: tokens are not words. They are fragments that help the model find patterns in language. By breaking text into these reusable pieces, the model can handle words it has never seen before by recognizing familiar parts within them.
How Tokenization Works
Every time you send a message to an AI tool, your text is automatically converted into tokens before the model processes it. You never see this happen; it takes place behind the scenes in milliseconds.
The model works from a fixed vocabulary of token pieces, typically between 30,000 and 100,000 entries. Common words like "the," "and," and "is" each get their own token. Uncommon words, technical terms, or words from other languages get split into smaller pieces that the model has seen in other contexts. The most common approach for building this vocabulary is called Byte Pair Encoding. It starts with individual characters and repeatedly merges the most frequent pairs, like noticing that "t" and "h" almost always appear together and treating "th" as a single unit.
A useful rule of thumb: one token equals roughly three-quarters of a word in English. A 500-word email is approximately 650 to 700 tokens. This ratio is not exact (it varies depending on the complexity of the vocabulary involved), but it gives you a practical way to estimate. If you want to see tokenization in action, OpenAI offers a free Tokenizer tool. Paste in any text and it will show you exactly how the text gets split into tokens, color-coded so you can see where each token begins and ends.
One important nuance: tokenization is language-dependent. English tokenizes efficiently because the vast majority of AI training data is in English. Text in other languages, specialized academic terminology, or uncommon proper nouns often requires more tokens to express the same meaning. A sentence in Korean or Arabic might use twice as many tokens as its English equivalent because the model's vocabulary was built primarily from English text, so other languages require more tokens to encode.
Context Windows and Memory
Every AI tool has a context window, a maximum number of tokens it can hold at one time. This includes everything: your prompts, the AI's responses, any uploaded documents, and the entire conversation history. All of it competes for the same fixed space.
Think of the context window as a desk. Everything you and the AI are working with has to fit on the desk at once. When the desk fills up, items fall off the far edge. The model loses access to the earliest parts of your conversation without telling you. This is why AI tools sometimes "forget" instructions you gave at the beginning of a long conversation. The conversation exceeded the context window and the oldest content was silently dropped.
Context window sizes have grown rapidly, from roughly 4,000 tokens in early models to over 100,000 tokens in current ones. But the fundamental constraint remains. No matter how large the window gets, it is still finite, and everything you send competes for space within it.
For practical strategies on working within these limits, see Managing Context.
Beyond Text
Tokens are not limited to text. When you upload an image to ChatGPT or Gemini, the model converts it into tokens too, breaking the image into small patches of pixels, each represented as a token the model can process. Audio works similarly, converted into visual representations of sound called spectrograms before being tokenized.
The principle is the same regardless of the input type: break complex information into small, processable units. This is why uploading a high-resolution image consumes a significant portion of your context window. A single image can become hundreds or thousands of tokens, leaving less room for your text instructions and the model's response.
Why Tokens Cost Money
AI services measure usage in tokens. When BYU-Idaho pays for ChatGPT, Copilot, or Gemini, the cost is tied to tokens processed: both the input tokens you send and the output tokens the model generates in response. More tokens means more computation, which means more cost.
This is why usage limits exist. A short, focused prompt consumes fewer input tokens than a long, unfocused one. A request that generates a two-paragraph response costs less than one that produces a ten-page document. The economics are straightforward: tokens are the currency AI tools charge by.
Understanding tokens explains otherwise puzzling tool behavior. Message limits, document size restrictions, conversation length caps, and the fact that image analysis counts differently than text. All of these trace back to tokens. Newer reasoning models add another layer: they generate internal "thinking" tokens as they work through complex problems. You do not see these tokens in the response, but they still count toward usage.
Why This Matters
Tokens shape every interaction you have with AI tools. Understanding them gives you a practical framework for tool behavior that replaces guesswork with knowledge. You understand why long conversations lose early context, why usage limits exist, why concise input tends to produce better results, and why uploading a large document changes how much room is left for everything else.
At BYU-Idaho, where employees across departments are adopting AI tools for teaching, research, and administrative work, this understanding helps you work within constraints rather than being surprised by them. You can write more efficient prompts, manage conversations more effectively, and make better use of the institutional AI resources available to you.
Key Takeaways
- Tokens are the building blocks of AI language processing. AI models do not read words. They break text into smaller pieces called tokens, which can be whole words, parts of words, or individual characters.
- Tokenization happens automatically and shapes everything. Every message you send is converted into tokens before the model processes it. One token is roughly three-quarters of a word in English.
- Context windows set the limits of AI memory. Every AI tool can only hold a fixed number of tokens at once. Your prompts, its responses, and uploaded documents all compete for the same space.
- Tokens are how AI services measure usage. The cost of AI tools is tied to tokens processed, which explains usage limits, rate caps, and why concise prompts are more efficient.
- Understanding tokens makes you a better AI user. Knowing how the tool processes your input helps you write more effective prompts, manage long conversations, and anticipate the tool's limitations.