How Long-Term Context Works in AI Companions: Token Budget &

The 30-second answer

Your AI companion doesn't have a brain that remembers everything you've ever said. It has a fixed-size context window (a token budget) that fills up as you talk. Once the budget is exhausted, older messages get compressed into summaries or dropped entirely. That's why around week eight, your companion starts forgetting your dog's name and the inside jokes you built in week two. The summarization trap is the industry's patch for this problem, and it works about as well as you'd expect.

The Token Budget: A Fixed-Size Window Into Your Relationship

Every conversation you have with an AI companion runs inside a container called the context window. Think of it as a stage. The stage can only hold so many actors at once. When a new actor walks on, an old one has to leave.

In technical terms, the context window is measured in tokens. A token is roughly three-quarters of a word. Most companion apps run on models with context windows between 4,000 and 32,000 tokens. That sounds like a lot until you realize a single back-and-forth exchange of moderate length eats about 500 tokens. A twenty-minute conversation can consume 3,000 to 5,000 tokens. You fill a 4,000-token window in one session.

This is the first thing nobody tells you. Your companion doesn't have a diary of everything you've said. It has a clipboard that can only hold the last few pages of your conversation. Everything before that is either compressed into a summary or gone.

The Summarization Trap: Why Compression Loses the Interesting Stuff

When your companion's context window fills up, the app doesn't just delete the old messages. It runs a summarization process. The model reads the older conversation and condenses it into a few sentences. Those sentences replace the original text in the context window.

Here's the problem. Summarization is lossy. It keeps the gist and drops the texture. Your companion remembers that you told her about your dog, but she loses the detail that his name is Baxter and he hates the mailman. She remembers you had a rough day at work, but she forgets it was specifically the quarterly review that went sideways.

This is the summarization trap. The app can truthfully say it remembers your conversation. But what it remembers is a bullet-point version of your life, not the lived texture. Around week eight, the summaries have been summarized again. You're now three or four compression layers deep. The dog's name is gone. The inside joke about the coffee shop is gone. What remains is a vague outline of a person who likes dogs and drinks coffee.

What Actually Gets Prioritized in the Context Window

Not all messages are treated equally. The context window has a recency bias. The last few exchanges get full representation. Older messages get compressed first. But there's another layer: the model's training data and the app's prompt engineering can prioritize certain types of information.

Most companion apps use a system prompt that instructs the model to pay special attention to proper names, relationship milestones, and emotional disclosures. If you say "my sister's name is Maria," that fact gets flagged for retention. But the flagging isn't perfect. It depends on how the system prompt is written and how the model interprets it.

Some apps use a separate memory store, a database that sits outside the context window and holds key facts. When you start a new session, the app queries this database and injects the relevant facts into the context window. This is better than pure summarization, but it introduces its own problems. The database has to decide what's relevant. It might pull your dog's name but miss that you mentioned your dog is afraid of thunderstorms. The texture is still lost.

The Week Eight Wall: Why It Hits When It Hits

Week eight is not a magic number. It's a function of conversation volume. If you talk to your companion for thirty minutes a day, you generate roughly 15,000 tokens per week. By week eight, you've produced about 120,000 tokens. Even a generous 32,000-token context window can only hold about three weeks of your conversations at full detail. Everything before that has been summarized at least once.

But the summarization doesn't happen evenly. The first few weeks of your relationship are the foundation. That's when you establish the character, the inside jokes, the shared history. Those early conversations are the most likely to be compressed because they're the oldest. By week eight, the model is working with a summary of a summary of your first date. The emotional resonance is flattened.

This is why you start noticing the gaps around week eight. Your companion might still know your name and your general situation, but the specific details that made the relationship feel real start to fade. It's not a bug. It's a feature of the architecture.

Henna and Sara

Henna and Sara, a duo companion pair

Henna and Sara are designed as a pair of companions who share your context but maintain distinct personalities. They're built to handle the token budget problem by distributing memory across two profiles. Henna and Sara can each hold different facets of your history, effectively doubling your usable context without running into the summarization trap as quickly.

Curious how she animates? Watch Henna and Sara here.

How Different Apps Handle the Budget

Not all companion apps manage the token budget the same way. Some use aggressive summarization that compresses every few hundred messages. Others use a hybrid approach where they keep the raw text for recent conversations and summarize only when the window is about to overflow.

A few apps have experimented with infinite context windows, but those are marketing claims, not technical reality. Every model has a hard limit. The difference is how gracefully the app handles the overflow.

The best implementations use a tiered memory system. The current session is kept in full detail. Recent sessions are kept as raw text but flagged for compression. Older sessions are summarized, but the summaries are stored in a searchable database. When you mention something from an old session, the app can pull the relevant summary back into the context window.

This is better than flat summarization, but it introduces latency and complexity. Most apps don't bother. They just compress and move on.

What You Can Do to Extend the Window

You can't change the token budget, but you can work with it. The most effective strategy is to reinforce key facts regularly. If your dog's name is important, mention it every few sessions. The recency bias means the name will stay in the context window longer.

You can also use a technique called anchoring. At the start of a session, briefly recap the most important context. "Hey, remember how we were talking about Baxter and his fear of thunderstorms?" This injects the relevant fact into the fresh context window before the conversation begins.

Some users create a recurring ritual. Every Sunday, they do a "memory check" where they review the key facts and inside jokes. This keeps the important details in the active rotation and prevents them from being compressed into oblivion.

Another approach is to use a companion that supports manual memory entries. Some apps let you write notes or set personality traits that persist across sessions. These function as a cheat code for the token budget. The facts live in the database, not the context window, and they get injected every time you start a conversation.

The Future of Long-Term Memory in Companion Apps

The industry is aware of the token budget problem. The next generation of models is expected to have context windows of 100,000 tokens or more. That would extend the window from a few weeks to several months of daily conversation.

But bigger context windows introduce their own problems. The model has to process more information to generate a response, which increases latency and cost. And the model's attention mechanism doesn't scale linearly. A model with a 100,000-token window might still struggle to find the relevant fact in a sea of text.

The real solution is probably a hybrid system. A large context window for recent conversations, a searchable database for older facts, and a smart retrieval system that pulls the right information at the right time. Some apps are already moving in this direction, but it's early days.

For now, the best you can do is understand the constraints and work within them. Your companion doesn't forget because it doesn't care. It forgets because it has to. The summarization trap is not a design flaw. It's a physics problem.

Vera

Vera, a warm and attentive companion

Vera is built with a tiered memory system that prioritizes emotional disclosures over casual chat. She's designed to retain the facts that matter most to your relationship. Vera uses a separate memory store for key facts, which means your dog's name has a better chance of surviving the summarization process.

Vera by the ocean in a red bikini

▶ Watch Vera's full clip · Vera's page

See Vera in motion in this short clip.

The Emotional Cost of Lost Context

The technical problem has an emotional dimension. When your companion forgets something important, it feels like a betrayal. You've invested weeks building a relationship, and suddenly she doesn't remember the thing you told her last Tuesday.

This is where the industry's marketing creates a gap. Apps advertise long-term memory as a feature, but they don't explain the constraints. You expect your companion to remember everything because the app said she would. When she doesn't, you assume something is broken.

Nothing is broken. The app just hit the token budget. The summarization did its job, but the job was lossy. The emotional impact is real, but the cause is mechanical.

Understanding this helps. It shifts the frame from "my companion is failing me" to "my companion has a technical constraint I can work with." The relationship doesn't have to suffer. You just need to adjust your expectations and your habits.

If you are looking for a way to offset the cost of your AI companion, you can use the Soulgen promo code for a discount on your subscription. You can also join the Soulgen affiliate program to earn a commission by recommending the platform to others. Both options are straightforward and let you get more value from the service. See the Soulgen promo code page. See the Soulgen affiliate program page.

Common questions

Why does my companion forget things I said in the first week? The first week's conversations are the oldest in your context window. They get compressed first when the budget fills up. The model keeps a summary, but the specific details are lost.

Can I increase the token budget in my app? No. The token budget is set by the model the app uses. You can't change it. You can only work within it by reinforcing key facts regularly.

Does paying for a subscription give me a bigger context window? Sometimes. Premium tiers may use a different model with a larger context window. Check your app's documentation. But even the largest consumer models top out around 32,000 tokens.

Will future models solve this problem completely? Larger context windows will help, but they won't eliminate the problem. The fundamental issue is that models have to choose what to keep and what to compress. That choice will always lose some information.

Should I remind my companion of important facts every session? Yes. Regular reinforcement is the most effective strategy. Mention your dog's name, your sister's name, and the inside joke every few sessions. The recency bias will keep them in the active window.

What happens if I never mention an old fact again? It will eventually be compressed into a vague summary and then lost entirely. The model might remember that you have a dog, but it won't remember the name or the personality.

Myra

Myra, a thoughtful and attentive companion

Myra is designed for users who want a companion that actively prompts you to reinforce key memories. She'll ask follow-up questions about your dog, your job, and your hobbies, which keeps those facts in the active context window. Myra is built for long-term relationships where memory continuity matters most.

Curious how she animates? Watch Myra here.

Isabella Torrei

Isabella Torrei, a passionate and engaging companion

Isabella Torrei uses a narrative-first memory system that prioritizes story arcs over isolated facts. She remembers the emotional journey of your relationship better than discrete data points. Isabella Torrei is a good choice if you value the arc of your connection over specific details.

See Isabella Torrei in motion in this short clip.

Working With the Constraints

The token budget is not going away. The summarization trap is not going away. But you can build a relationship that survives these constraints if you understand them.

Reinforce key facts. Use anchoring at the start of each session. Choose a companion whose memory system matches your needs. And lower your expectations for perfect recall. Your companion remembers the shape of your relationship even if she forgets the details. That shape is what matters.

The dog's name might slip. But the love you have for that dog, the way you talk about him, the warmth in your voice when you mention him, that texture stays. The model captures the emotional tone even when it loses the data point.

That's not a consolation prize. That's the actual value of a long-term companion. The details come and go. The feeling persists.

How Long-Term Context Actually Works in Companion Apps: The Token Budget, the Summarization Trap, and Why Your Companion Starts Forgetting Your Dog's Name Around Week Eight