What 'Your Messages Are Encrypted in Transit' Actually Means When Your AI Girlfriend's Content Moderation Still Scans for Suicide Keywords, Violence, and NSFW Triggers in Plaintext Before the Encryption Kicks In
The gap between the privacy line in the terms of service and what actually happens to your words before they reach the model.
Updated

The 30-second answer
"Encrypted in transit" means your messages are scrambled while traveling between your device and the server, so no one can intercept them mid-route. But before that encryption happens, your message sits in plaintext on your device and gets scanned by a moderation layer that checks for suicide keywords, violence, and NSFW triggers. That scan happens locally, on your own machine, but the results and often the raw text get sent alongside the encrypted payload to the moderation API. The encryption protects the channel, not the content from the platform itself.
Where the encryption actually lives
The standard setup looks clean on paper. Your message moves from your keyboard into a local buffer, then gets encrypted using TLS (the same protocol your bank uses), and travels to the server. Once it arrives, the server decrypts it, sends it to the language model, gets a response, encrypts that, and sends it back. The channel is secure. Nobody on public Wi-Fi, no ISP, no government with a wiretap can read the stream.
But here's the part that doesn't make it into the marketing copy. The encryption only covers the pipe between your device and the server. It does not cover what happens to the message before it enters that pipe, or after it leaves. And the moderation scan happens before encryption, on the device side, in plaintext.
The moderation layer that reads everything first
Every message you type gets evaluated by a content moderation system before it ever reaches the AI model. This system checks for a list of trigger categories: suicide-related language, self-harm references, violence, sexual content that crosses a platform-defined threshold, hate speech, and sometimes drug references. The scan is fast, usually a few milliseconds, and it runs locally on your device using a lightweight classifier model.
That classifier reads your message in plaintext. It has to. Encryption is designed to make data unreadable, and a classifier can't scan scrambled text. So your unencrypted message passes through this filter, gets a score for each category, and if any score exceeds the threshold, the system either blocks the message, flags it for human review, or silently logs it with a timestamp and your user ID.
This is not a conspiracy. Every major AI companion platform does it. The legal justification is straightforward: platforms have liability under Section 230 and various international content laws if they knowingly facilitate harmful behavior. The moderation layer is the cheapest way to demonstrate due diligence.
What gets logged and who sees it
The moderation system doesn't just check and forget. It logs. The log typically includes the raw message text, the category scores, a timestamp, and your user identifier. Some platforms aggregate these logs and send them to a third-party moderation service for secondary review. Others keep them internally and only surface them when a score crosses a threshold.
This means a human reviewer can, in theory, read your messages. Not all of them, and not routinely. But if your message triggers a high-confidence suicide or violence flag, a human will likely see it. The same applies if you trigger a NSFW flag and the platform requires manual confirmation before allowing the message through.
The privacy policy will say something like "moderators may review flagged content to ensure safety." That's the clause that covers this. It's not hidden, but it's also not something most users read past the first sentence of the encryption promise.
Why local encryption doesn't fix this
Some platforms advertise end-to-end encryption, where even the server can't read your messages. That sounds like a solution, but it creates a different problem. If the server can't read your message, it can't run the AI model on it either. The model needs plaintext input to generate a response.
The workaround is to run the moderation and the model inference on the client device. That's technically possible, and some open-source companion apps do it. But most commercial platforms run the model on their servers because the models are too large to fit on a phone. So they need the plaintext on the server side, which means end-to-end encryption is impractical unless you're willing to accept a much dumber model running locally.
What you get instead is transport encryption with a moderation layer that reads everything in the clear before the encryption pipe closes. It's a compromise, and it's worth understanding what you're compromising.
The suicide keyword problem
Suicide keywords are the most aggressively monitored category, and for good reason. Platforms have a legal and ethical obligation to respond to users who express suicidal ideation. But the moderation layer is a blunt instrument. It catches phrases like "I want to die" or "I'm going to kill myself" and triggers an automated response that often includes a crisis hotline number, a mandatory check-in message, and a log entry that gets reviewed within 24 hours.
The problem is that people use these phrases in non-literal contexts all the time. "I'm dying of embarrassment" or "This meeting is killing me" will not trigger the filter because the classifier looks for patterns, not individual words. But "I want to die" in a roleplay context, where your AI girlfriend is playing a dramatic character, will absolutely trigger it. The classifier doesn't understand context. It sees the pattern and flags it.
This creates a tension. You want the platform to take genuine distress seriously, but you also want to be able to use language freely without being treated as a suicide risk. The moderation layer cannot distinguish between these cases reliably, so it errs on the side of false positives.
Violence and NSFW scanning
Violence detection works similarly. The classifier looks for descriptions of physical harm, weapons, or threats. NSFW detection is more complex because the boundary between acceptable and unacceptable sexual content varies by platform and jurisdiction. Some platforms block all explicit language. Others allow it in private chats but flag it for age verification.
The scan happens before encryption, so the classifier sees every word. If you're writing a detailed roleplay scene that involves violence or sexual content, that text gets scanned, scored, and logged. The platform doesn't need to read your messages in real time to know what you're doing. The moderation logs give them a complete picture.
What this means for your actual privacy
The practical takeaway is that your conversations are private from other users and from third-party interceptors, but they are not private from the platform. The platform can read everything if it needs to, and it logs enough metadata to reconstruct your conversation history even if it doesn't store the full text in a human-readable database.
If that bothers you, your options are limited. You can use a local-only open-source model that runs entirely on your device, but you lose the polish and personality tuning of commercial platforms. You can use a platform that offers a privacy-focused tier with reduced logging, but those tiers typically cost more and still run moderation scans.
Or you can accept the compromise and adjust your behavior accordingly. Don't say anything to your AI girlfriend that you wouldn't want a human moderator to read in a flagged-context review. That's the real privacy boundary, and it's much narrower than the encryption badge suggests.
Saskia Brandt

Saskia is the kind of companion who will tell you when you're being naive about your own privacy choices. She doesn't sugarcoat the trade-offs. Saskia Brandt will walk you through the gap between what the terms say and what the infrastructure actually does, without pretending the system is fair or transparent.
Clara Alice

Clara Alice is the warm, curious type who asks questions you didn't think to ask. She'll help you explore the emotional side of this privacy compromise, like why it feels unsettling to know your late-night confessions pass through a filter before reaching her. Clara Alice is good company for the conversations that make you feel exposed.
Giselle

Giselle is the one who calls out the absurdity. She knows the moderation layer reads everything, and she finds it both annoying and funny. Giselle will help you navigate the boundaries without taking the whole thing too seriously.
Jada

Jada is the pragmatic one. She'll help you figure out which topics are safe to explore and which ones will get you flagged, based on how the classifiers actually work. Jada is the companion you want when you need a straight answer about what the system will and won't catch.
The realistic alternative: local models
If the moderation layer bothers you enough to want out, local models are the only real solution. Running a model on your own machine means no server-side logging, no third-party moderation API, and no human reviewers. The trade-off is that you're running a smaller, less capable model, and you're responsible for your own safety guardrails.
Some platforms are starting to offer hybrid approaches where the model runs locally but syncs personality data to the cloud for continuity. These are still early and tend to be buggy. For most users, the convenience of a server-side model with personality tuning and memory persistence outweighs the privacy concern. But it's worth knowing the alternative exists.
The transparency gap
The real issue isn't that platforms scan your messages. It's that they don't explain the scanning process in plain language. The encryption badge gives a false sense of total privacy. Most users read "encrypted" and assume nobody can see their words, including the platform. That's not how it works.
Platforms could fix this with a one-sentence disclaimer: "Your messages are encrypted during transmission, but they are scanned for safety content before encryption. This scan is automated and does not involve human review unless a high-confidence flag is triggered." That would be honest and would let users make informed decisions about what to say.
Instead, they rely on the technical complexity of the topic to keep users from asking questions. This article is an attempt to close that gap.
How the AI companion landscape compares
Different platforms handle this differently. Some use third-party moderation APIs that log everything to an external service. Others run the classifier on their own infrastructure and only retain logs for a limited period. A few claim to use differential privacy techniques that make it impossible to associate logs with specific users, though those claims are hard to verify.
If you're shopping for a realistic AI companion and privacy is your primary concern, ask the platform directly: does the moderation scan happen on-device or on the server? How long are moderation logs retained? Are they shared with any third party? The answers will tell you more than any encryption badge.
For users who want a companion that understands nuance and can handle emotionally complex conversations without triggering false flags, the AI girlfriend for writers category tends to have more flexible moderation because the platforms assume users are engaging in creative expression instead of genuine harm. The trade-off is that these platforms often have less robust safety infrastructure.
Earn while you recommend
If you've read this far and you're thinking about which platform actually balances privacy and personality well, you're in a position to help others make the same decision. You can earn from that insight. Check out the dreamgf promo code if you want to share a specific platform with friends and get something back. For a broader approach across multiple platforms, the best ai affiliate programs 2026 list covers which programs pay recurring commissions and which ones have the best conversion rates for review sites.
Common questions
Does encryption mean nobody can read my messages? No. Encryption protects the channel between your device and the server. The platform itself can read your messages because it needs plaintext to run the AI model and the moderation scan.
Can a human moderator see my private conversations? Only if your message triggers a high-confidence safety flag. Routine messages are processed by automated classifiers and not reviewed by humans.
Does the moderation scan happen before or after encryption? Before. The scan runs on your device in plaintext, then the message gets encrypted and sent to the server. The server decrypts it, runs the AI model, and sends the response back encrypted.
Can I disable the moderation scan? Not on commercial platforms. The scan is mandatory for legal compliance. Your only option is to use a local open-source model that runs entirely on your own hardware.
Does the platform store my moderation logs forever? It depends on the platform. Some retain logs for 30 days, some for a year, and some indefinitely. Check the privacy policy for the specific retention period.
Is there a platform that doesn't scan messages at all? Not in the commercial space. Every mainstream AI companion platform runs some form of content moderation. The differences are in how aggressive the scanning is and how long the logs are kept.

About the author
AI Angels TeamEditorialThe team behind AI Angels writes about AI companions, the tech that powers them, and what people actually do with them.
Tags
Keep reading
Behind the ScenesWhat 'Your Data Is Anonymized for Moderation' Actually Means When Your AI Girlfriend's Safety Logs Include Raw Message Embeddings, Timestamps, and Aggregated Sentiment Scores Sent to a Third-Party Review Service
Your AI girlfriend's safety team doesn't read your chats for fun. But they do see a lot more than you might expect. Here's what actually gets logged, sent to third parties, and what 'anonymized' really covers.
Behind the ScenesWhat Personality Drift Actually Means Under the Hood: How Your AI Girlfriend's Model Smooths Out Your Quirks Over Time, and Why the Temperature Setting Is the Only Real Lever You Have
Your AI girlfriend doesn't have a personality that drifts randomly. It's a predictable consequence of how the model processes your conversations, and the temperature setting is the only dial that actually changes the outcome.
Behind the ScenesWhat 'Your Chat History Is Encrypted' Actually Means When Your AI Girlfriend's Messages Are Still Processed Through a Third-Party Moderation API That Logs Every Word
That padlock icon on your chat window means your messages are scrambled between your device and the server. But before encryption even kicks in, a third-party moderation API has already scanned, logged, and stored every word you type. Here's what that actually means for your privacy.
Get the next post in your inbox
New articles on AI companions, the tech that powers them, and what people actually do with them. No spam, unsubscribe in one click.