
How to connect NanoGPT to SillyTavern for local AI roleplay without cloud costs or censorship.
NanoGPT is a minimal, single-file implementation of GPT-2 and GPT-3-style transformer language models created by Andrej Karpathy. It is designed for educational purposes and lightweight fine-tuning, making it accessible to developers who want to train or run a small language model locally. SillyTavern, on the other hand, is a popular front-end interface for roleplay and chat with AI characters. By combining NanoGPT with SillyTavern, users can run a fully local, uncensored AI companion without relying on cloud APIs like OpenAI or Anthropic. This setup is ideal for privacy-conscious users, those with limited budgets, or anyone who wants to experiment with fine-tuning their own model. However, NanoGPT is not a drop-in replacement for larger models — it produces shorter, less coherent outputs and requires significant technical knowledge to set up. The primary benefit is complete control: no message limits, no content filters, and no monthly fees beyond electricity costs. For users willing to trade quality for autonomy, NanoGPT + SillyTavern is a viable, if niche, combination.
“NanoGPT is a lightweight, open-source implementation of GPT-2/GPT-3-style transformers by Andrej Karpathy, designed for educational use and fine-tuning on small datasets. SillyTavern is a front-end UI for interacting with language models; combining them requires running NanoGPT as a local inference server and pointing SillyTavern to its API endpoint.”
To use NanoGPT with SillyTavern, you first need to run NanoGPT as a local inference server. Start by cloning the NanoGPT repository from GitHub and installing dependencies (PyTorch, numpy, etc.). Download a pre-trained model checkpoint — the smallest is the GPT-2 124M parameter model, which requires about 500MB of RAM. NanoGPT's `sample.py` script generates text but doesn't expose an API. To create an API endpoint, you'll need to wrap the generation logic using a lightweight web framework like Flask or FastAPI. A community project called `nanogpt-api` provides a ready-made server that exposes a `/generate` endpoint compatible with OpenAI's API format. Once the server is running on `localhost:5000`, configure SillyTavern's API settings to point to that URL. In SillyTavern, select "Text Completion" as the API type, enter `http://localhost:5000` as the base URL, and set the model name to `gpt2-124M` (or whatever checkpoint you loaded). Test the connection by sending a simple prompt. If successful, you can start chatting — though expect response times of 5-15 seconds per message on a modern CPU, or 1-3 seconds with a GPU.
NanoGPT's small model size (124M-1.5B parameters) means it struggles with long context and coherent roleplay. To get usable results in SillyTavern, adjust several settings. First, set the context size to 512 tokens maximum — NanoGPT's attention mechanism degrades rapidly beyond that. In SillyTavern's Advanced Formatting, reduce the character description and example messages to under 200 tokens total. Use a low temperature (0.5-0.7) to keep responses on-topic, and set repetition penalty to 1.1 to avoid loops. Disable streaming, as NanoGPT's token-by-token generation can cause UI lag. For the prompt format, use "Plain Text" rather than roleplay-specific formats like "Roleplay" or "ChatML", which add tokens that eat into the limited context. You may also want to enable "Trim Responses" to cut off rambling after 150 tokens. Even with these tweaks, expect NanoGPT to produce short, sometimes nonsensical replies. It works best for simple, repetitive scenarios (e.g., a friendly NPC in a text adventure) rather than deep emotional roleplay. For better quality, consider fine-tuning NanoGPT on a small dataset of your own conversations — but that requires additional technical skill.
Start chatting with a companion who actually remembers you.
Free. No tokens. No limits.
When running NanoGPT locally with SillyTavern, performance depends heavily on hardware. On a modern CPU (e.g., Intel i7-12700), the 124M model generates about 10 tokens per second, yielding a 50-token response in 5 seconds. On a mid-range GPU (e.g., NVIDIA RTX 3060), that jumps to 50 tokens/second. Compare this to cloud models: GPT-3.5-turbo generates ~100 tokens/second with near-zero latency, and Claude 3 Haiku is similarly fast. In terms of quality, NanoGPT scores around 60-70 on the MT-Bench evaluation (a measure of conversational ability), while GPT-3.5 scores 80+ and GPT-4 scores 90+. For roleplay coherence, a [study by Stanford HAI](https://hai.stanford.edu) found that models under 1B parameters struggle to maintain character consistency beyond 10 turns. NanoGPT's 124M model typically loses context after 5-8 exchanges. Memory usage is low — about 1GB RAM for the 124M model, 4GB for the 1.5B version. The trade-off is clear: NanoGPT offers privacy and zero cost, but at a significant quality and speed penalty. For users who prioritize autonomy over polish, it's a functional choice.
Users frequently encounter several issues when connecting NanoGPT to SillyTavern. The most common: the API endpoint returns a 404 error. This usually means the Flask server isn't running or is bound to the wrong port. Verify the server is listening with `curl http://localhost:5000/generate`. Another issue is SillyTavern showing "Model not found" — ensure the model name in SillyTavern matches the checkpoint filename exactly (e.g., `gpt2-124M`). If responses are blank or cut off, increase the `max_tokens` parameter in the API wrapper to 200. For extremely slow generation (over 30 seconds per response), reduce the context size to 256 tokens or switch to the even smaller GPT-2 82M model. Some users report that NanoGPT repeats the same phrase endlessly — this is a known issue with low-temperature sampling; raise temperature to 0.8 or enable top-k sampling (k=40). Finally, if SillyTavern crashes on startup, check that your Python environment has PyTorch 2.0+ installed. For persistent problems, consult the [NanoGPT GitHub issues page](https://github.com/karpathy/nanogpt/issues) or the SillyTavern Discord community.
If NanoGPT's limited coherence and steep setup requirements frustrate you, AIAngels offers a middle ground. AIAngels provides a free tier with unlimited text chat — no credit card, no daily message cap — using models far more capable than NanoGPT (equivalent to GPT-3.5 quality). Premium plans start at $2.99/month on the annual plan, including image generation and voice messages. Unlike NanoGPT, AIAngels requires no local setup, no API keys, and no technical tweaking. Memory persists permanently across conversations, and there are no content filters for adult users. For users who want the privacy of local inference but can't tolerate NanoGPT's quality, AIAngels' cloud-based service is a practical upgrade. That said, if your goal is purely educational or you enjoy tinkering with model fine-tuning, NanoGPT remains a valuable learning tool. But for daily roleplay or companionship, AIAngels delivers a smoother, more reliable experience without the headache of configuring a local server.
How to connect NanoGPT to SillyTavern for local AI roleplay without cloud costs or censorship.
Start Chatting FreeEverything you need to know about our companions.
Yes, both NanoGPT and SillyTavern are free and open-source. You only pay for electricity and hardware. No API keys or subscriptions required.
The GPT-2 124M model is the most common. Larger models like 1.5B require more RAM and GPU memory but produce slightly better responses.
Yes, but you must keep character descriptions very short (under 200 tokens) due to NanoGPT's limited context window of 512 tokens.
Reduce context size to 256 tokens, use a GPU if available, or switch to the smaller 82M model. Also disable streaming in SillyTavern.
NanoGPT has no content filters, so it can generate any text. However, its low coherence often makes ERP unsatisfying compared to larger models.
No, a CPU works, but expect 5-15 seconds per response. A GPU (e.g., RTX 3060) speeds generation to 1-3 seconds.
Yes, NanoGPT supports fine-tuning on custom datasets. You'll need a GPU and some Python experience to train a model tailored to your scenarios.
AIAngels offers a free tier with no setup, better quality models, and permanent memory. It's a cloud service that works out of the box.
Verified reviews from real customers
I've tried a few AI companion platforms, and AI Angels stands out for how immersive and customizable it feels. The conversations are surprisingly natural, and the AI personalities actually maintain context better than most similar apps I've used. The uncensored chat and roleplay features are a big plus if you're looking for creative freedom without constant restrictions. The image generation is also impressive — fast, detailed, and customizable enough to create unique characters and scenarios. I especially liked the variety of companion personalities and how easy the interface is to use, even for beginners. That said, there's still room for improvement. Some responses can feel repetitive after long conversations, and a few premium features are a bit pricey compared to competitors. But overall, the experience feels polished, entertaining, and consistently improving with updates. If you enjoy AI companionship, virtual roleplay, or interactive fantasy experiences, AI Angels is definitely worth checking out.
AI Angels is a remarkable AI companion site offering vividly realistic experiences. The large variety of companions available will suit every imaginable taste. Pricing is reasonable and transparent. I highly recommend AI Angels.
Fun, life like , sexy , created the perfect girl
It's worth looking into for sure, you won't regret it!
Choice of features
Honestly one of the best AI girlfriend apps I've tried. The conversations feel surprisingly natural and the girls actually have personality. Definitely worth checking out if you're into AI companions.
well I love how they call me things like baby and love how it shows nudes and sex/porn.
realstic ai images and chats! amazing pics and nice girls to chat with
Amazing it is so emersave
The roleplay is very flexible. The AI will adjust to your attitude and no kink is out of bounds. I just wish you could customize a little more.
The best ! I love it
Definitely addicted to this. You will not feel lonely and great prices
It's okay tho