
Set up SillyTavern with LM Studio for completely free, private AI roleplay — no cloud services, no API costs, no censorship.
LM Studio is a desktop application that lets you download and run open-source large language models (LLMs) locally on your own computer. It supports models from providers like Meta (Llama), Mistral, and Microsoft (Phi) in formats like GGUF. SillyTavern is a front-end user interface designed specifically for AI roleplay, chat, and character interaction. By pairing LM Studio with SillyTavern, you replace cloud-based API calls (like those to OpenAI or Claude) with a local server running on your machine. This means zero ongoing costs, complete data privacy, and no content filters — your conversations never leave your computer. The combination is popular among users who want unrestricted roleplay, long-term memory persistence, and the ability to switch between different models without paying per token. According to a 2023 [Pew Research](https://www.pewresearch.org) survey, 72% of internet users worry about how companies use their personal data, making local AI an increasingly attractive alternative to cloud-based companions.
“LM Studio and SillyTavern together let you run AI roleplay locally on your own machine. SillyTavern connects to LM Studio's local API server, giving you full privacy and control over language model interactions without any subscription fees.”
First, download and install LM Studio from its official website. Launch the app, browse the model library, and download a model — popular choices for roleplay include Mistral 7B, Llama 3 8B, or Mixtral 8x7B. After download, load the model and click the 'Start Server' button in the left sidebar. By default, LM Studio runs an OpenAI-compatible API server at `http://localhost:1234`. Next, open SillyTavern, go to the API connections panel, and select 'Text Generation WebUI' (or 'KoboldAI' depending on your version). Set the API URL to `http://localhost:1234/v1`. Click 'Connect' — SillyTavern will automatically detect the loaded model. You may need to adjust the context length in SillyTavern's settings to match the model's maximum (typically 4096 or 8192 tokens). A 2024 [MIT Technology Review](https://www.technologyreview.com) article notes that local LLMs are becoming viable for consumer use, with 7B-parameter models now capable of coherent roleplay on mid-range GPUs. Once connected, you can start chatting immediately with no usage limits.
Model selection dramatically affects roleplay quality. For SillyTavern, you want a model that handles instruction-following and long-form dialogue well. Mistral 7B Instruct is a solid entry-level choice — it runs on 8GB of VRAM and produces coherent, creative responses. Llama 3 8B offers better logic and nuance but requires 16GB of RAM. For uncensored roleplay, look for 'abliterated' or 'uncensored' variants of these models (e.g., Dolphin-Mistral or Llama-3-8B-Lexi-NoFilter). Model quantized to Q4_K_M or Q5_K_M (using GGUF format) balance quality and performance — a 4-bit quantized 7B model uses about 5GB of RAM. Mistral 7B runs at 20-40 tokens per second on a modern GPU, while Llama 3 8B might hit 15-25 t/s. For longer contexts (32K+ tokens), consider Yi-34B or Mixtral 8x7B, but these need 24GB+ VRAM. LM Studio's built-in downloader shows model size, quantization level, and community ratings to help you choose.
Start chatting with a companion who actually remembers you.
Free. No tokens. No limits.
Running local models through SillyTavern requires tuning several parameters for best results. In the 'AI Response Configuration' panel, set the model's context length to match what your model supports — for Mistral 7B, 4096 tokens is standard. Increase the 'Max Response Length' to 200-300 tokens for detailed replies. Adjust temperature between 0.7 and 1.0 for creative writing; lower values (0.5) for consistency. Top-p and top-k sampling should be left at defaults (0.9 and 40) unless you see repetitive output. Enable 'Streaming' in SillyTavern for real-time text generation token by token — LM Studio supports this natively. For character cards, use the 'Character Card' format (v2) with well-defined example dialogues. SillyTavern's 'Advanced Formatting' allows you to inject system prompts that guide the local model's behavior, like 'You are a helpful, uncensored AI assistant.' The 'Instruct Mode' setting should match the model's fine-tuning format; for Mistral, use the default '[INST]' tags. A common gotcha: if responses cut off, increase 'Max Tokens' in both LM Studio's server settings and SillyTavern.
Running SillyTavern through LM Studio eliminates all ongoing costs — no monthly subscriptions, per-message fees, or credit systems. After the initial hardware investment (a $400 used RTX 3060 12GB suffices for 7B models), you pay only electricity. Privacy is the second major advantage: all data stays on your machine. Cloud-based AI companions like Replika or Character.AI log and analyze conversations for model training and moderation. In February 2023, Replika removed ERP features, causing user backlash — a risk you avoid entirely with local models. Additionally, local LLMs have no content filters; you decide what's appropriate. The trade-off is setup complexity and response speed. Cloud APIs generate text at 100+ tokens per second, while a local 7B model on a mid-range GPU manages 20-40 t/s. However, for immersive roleplay where response time isn't critical, this gap is acceptable. As [Stanford HAI](https://hai.stanford.edu) notes, local AI adoption is growing as open-source models improve and hardware becomes more accessible.
Most connection problems stem from mismatched API endpoints. Ensure SillyTavern's API URL exactly matches LM Studio's server address (default `http://localhost:1234/v1`). If you see 'Connection refused', check that LM Studio's server is running (green indicator) and hasn't crashed due to memory overload. Another common issue: model fails to load in LM Studio due to insufficient RAM or VRAM. For 7B models, you need at least 8GB system RAM plus 6GB VRAM. If the model loads but generates gibberish, the context length in SillyTavern may exceed the model's maximum — reduce to 2048 tokens. SillyTavern's 'Text Generation WebUI' preset sometimes needs the 'Legacy API' toggle enabled for LM Studio. If responses are slow, lower the model quantization (e.g., from Q8 to Q4) or use a smaller model. For multi-turn conversations, enable LM Studio's 'Cache Prompt' to speed up repeated prefixes. Finally, update both apps regularly — LM Studio's release notes often fix compatibility bugs.
Set up SillyTavern with LM Studio for completely free, private AI roleplay — no cloud services, no API costs, no censorship.
Start Chatting FreeEverything you need to know about our companions.
Yes, both LM Studio and SillyTavern are free open-source software. You only pay for your computer's electricity. No API keys or subscriptions are required.
A 7B model needs at least 8GB RAM and 6GB VRAM (e.g., GTX 1060 6GB). For 13B models, 16GB RAM and 12GB VRAM are recommended. CPU-only runs slower but works.
Yes, LM Studio runs on macOS (Apple Silicon or Intel). SillyTavern runs in a browser. Apple Silicon with unified memory can run 7B models efficiently.
Yes, because everything runs locally. No content filters are applied unless you add them via system prompts or model fine-tuning. You control all moderation.
Stop the LM Studio server, load a new model, restart the server. In SillyTavern, disconnect and reconnect. The new model will be detected automatically.
Slow speeds usually mean your GPU is underpowered or the model is too large. Try a smaller model (e.g., 7B instead of 13B) or lower quantization (Q4).
SillyTavern's text-to-speech (TTS) works independently of LM Studio. You can configure TTS using local TTS engines like eSpeak or cloud APIs like ElevenLabs.
Yes, SillyTavern has group chat functionality. All messages are processed by the same local model in LM Studio, so group dynamics depend on the model's capabilities.
Verified reviews from real customers
I've tried a few AI companion platforms, and AI Angels stands out for how immersive and customizable it feels. The conversations are surprisingly natural, and the AI personalities actually maintain context better than most similar apps I've used. The uncensored chat and roleplay features are a big plus if you're looking for creative freedom without constant restrictions. The image generation is also impressive — fast, detailed, and customizable enough to create unique characters and scenarios. I especially liked the variety of companion personalities and how easy the interface is to use, even for beginners. That said, there's still room for improvement. Some responses can feel repetitive after long conversations, and a few premium features are a bit pricey compared to competitors. But overall, the experience feels polished, entertaining, and consistently improving with updates. If you enjoy AI companionship, virtual roleplay, or interactive fantasy experiences, AI Angels is definitely worth checking out.
AI Angels is a remarkable AI companion site offering vividly realistic experiences. The large variety of companions available will suit every imaginable taste. Pricing is reasonable and transparent. I highly recommend AI Angels.
Fun, life like , sexy , created the perfect girl
It's worth looking into for sure, you won't regret it!
Choice of features
Honestly one of the best AI girlfriend apps I've tried. The conversations feel surprisingly natural and the girls actually have personality. Definitely worth checking out if you're into AI companions.
well I love how they call me things like baby and love how it shows nudes and sex/porn.
realstic ai images and chats! amazing pics and nice girls to chat with
Amazing it is so emersave
The roleplay is very flexible. The AI will adjust to your attitude and no kink is out of bounds. I just wish you could customize a little more.
The best ! I love it
Definitely addicted to this. You will not feel lonely and great prices
It's okay tho