
Learn how to configure and use external media in SillyTavern for immersive AI interactions with images, audio, and file uploads.
External media in SillyTavern refers to the capability of attaching images, audio files, or other media to messages that are sent to the AI character. When enabled, the character's context includes metadata or descriptions of the media, allowing the AI to acknowledge and react to it. This feature is supported through several API backends, including KoboldAI, OpenAI (with vision models like GPT-4V), and Claude. The media is typically converted into a text description or base64-encoded input, depending on the API. For example, an image of a sunset can be described by the AI as "a warm orange sky over a calm ocean," and the character can incorporate that into the roleplay. External media is not enabled by default and requires manual configuration in the extension settings. It is a powerful tool for enhancing creative writing, visual storytelling, and interactive adventures, but it also increases token usage and response time.
“SillyTavern external media refers to the ability to send images, audio, or files to an AI character within a SillyTavern chat, typically via an API like KoboldAI or OpenAI. This feature allows characters to perceive and respond to user-uploaded content, enriching roleplay and storytelling.”
To enable external media in SillyTavern, first ensure you are using a compatible API. For OpenAI, you need GPT-4V or later; for KoboldAI, use a model with vision support like LLaVA. Open the SillyTavern interface and navigate to the Extensions menu (puzzle piece icon). Find the 'External Media' or 'Media Upload' extension and toggle it on. You may need to set a max file size (e.g., 5 MB) and allowed file types (png, jpg, mp3). Next, configure the API endpoint: if using OpenAI, set the model to 'gpt-4-vision-preview' and ensure your API key has access. For KoboldAI, point to your local instance running a vision-compatible model. After enabling, a media upload button (camera or paperclip icon) appears in the chat input bar. Upload an image or audio file, then send the message. The AI will process the media and generate a response referencing it. Note that some APIs charge per token for image processing, so monitor usage.
SillyTavern's external media extension supports several formats: images (JPEG, PNG, GIF, WebP), audio (MP3, WAV, OGG), and in some configurations, video or PDF. File size limits depend on the API provider. For OpenAI GPT-4V, each image can be up to 20 MB, but larger images are downscaled. KoboldAI local models may have lower limits, typically 5-10 MB per file. Audio files are usually limited to 25 MB with OpenAI's Whisper integration. SillyTavern itself does not enforce hard limits, but the extension settings allow you to set a maximum upload size (default 5 MB) and block certain MIME types. For roleplay, images under 2 MB work best for quick processing. Animated GIFs are supported but may be treated as static frames by some APIs. Audio transcription uses Whisper and can handle 30-second clips. Video uploads are experimental and require conversion to frames.
Start chatting with a companion who actually remembers you.
Free. No tokens. No limits.
External media behavior varies by API backend. With OpenAI GPT-4V, images are processed natively—the model sees the image and can describe it accurately. This is ideal for visual storytelling (e.g., sharing a character portrait). KoboldAI with LLaVA or similar multimodal models also supports images, but quality depends on the model size (7B vs 13B parameters). For audio, OpenAI's Whisper transcribes speech, which then enters the chat context; the character can hear and respond. Claude (Anthropic) via proxy also supports image input. However, not all models are multimodal: if you use a text-only model, media is ignored or appended as a text link. To get the best results, choose a model explicitly designed for vision or audio. The SillyTavern wiki lists compatible models for each backend. Keep in mind that processing media consumes extra tokens—an image can cost 1,000+ tokens, so budget accordingly.
Uploading external media to SillyTavern has privacy implications. If you use a cloud API like OpenAI, your images and audio are sent to their servers and processed per their privacy policy. For sensitive content, consider using a local API like KoboldAI or llama.cpp with a multimodal model, keeping all data on your machine. SillyTavern does not store media permanently; it is sent as part of the chat context and may be retained in chat logs if you save them. The extension settings allow you to disable media logging. Also, be aware that some APIs (e.g., OpenAI) may use uploaded images for model training unless you opt out. Always check the API provider's data handling policies. For maximum privacy, run everything locally with a model like LLaVA 13B, which can process images on your GPU without external transmission.
If external media fails to send, first confirm the API key has permissions for vision/audio endpoints. Check the SillyTavern console (F12) for error messages: '400 Bad Request' often means the file is too large or format unsupported. Reduce image resolution to 1024x1024 pixels or convert to JPEG. For audio, ensure it's under 25 MB and properly encoded (mono, 16kHz for Whisper). If the model ignores media, it likely lacks multimodal support—switch to GPT-4V or LLaVA. Another common issue is that the 'External Media' extension is disabled or conflicting with other extensions like 'Character Expressions'. Disable other extensions one by one to isolate the conflict. For roleplay scenarios, the AI might not respond to media if the character card doesn't include instructions to acknowledge media. Add a note in the character description: "You can see images sent by the user." Restart SillyTavern after changing settings.
Learn how to configure and use external media in SillyTavern for immersive AI interactions with images, audio, and file uploads.
Start Chatting FreeEverything you need to know about our companions.
Free APIs like those from some local models or free tiers of cloud services may support external media, but most free tiers limit file size or number of requests. Local models are the best free option.
Yes, but the character must be prompted to notice media. Add instructions in the character's description or example messages to ensure the AI acknowledges uploaded images or audio.
JPEG, PNG, GIF, and WebP are supported. Animated GIFs are treated as static by most APIs. For best results, use PNG or JPEG under 5 MB.
Enable the External Media extension, then click the camera icon in the chat input bar. Select an image file from your device, and it will be attached to your next message.
No, external media is a SillyTavern-specific feature. Platforms like Character.AI do not support user-uploaded media in the same way. You need a SillyTavern setup with a compatible API.
Yes, especially with cloud APIs. OpenAI charges per image based on resolution (e.g., 1,000 tokens per 1024x1024 image). Audio transcription also costs extra. Local models avoid per-use fees.
Yes, if you use an API with audio support like OpenAI (Whisper) or a local model with speech-to-text. The audio is transcribed, and the text enters the chat context.
Go to Extensions > External Media and toggle the extension off. This removes the media upload button from the chat bar until you re-enable it.
Verified reviews from real customers
I've tried a few AI companion platforms, and AI Angels stands out for how immersive and customizable it feels. The conversations are surprisingly natural, and the AI personalities actually maintain context better than most similar apps I've used. The uncensored chat and roleplay features are a big plus if you're looking for creative freedom without constant restrictions. The image generation is also impressive — fast, detailed, and customizable enough to create unique characters and scenarios. I especially liked the variety of companion personalities and how easy the interface is to use, even for beginners. That said, there's still room for improvement. Some responses can feel repetitive after long conversations, and a few premium features are a bit pricey compared to competitors. But overall, the experience feels polished, entertaining, and consistently improving with updates. If you enjoy AI companionship, virtual roleplay, or interactive fantasy experiences, AI Angels is definitely worth checking out.
AI Angels is a remarkable AI companion site offering vividly realistic experiences. The large variety of companions available will suit every imaginable taste. Pricing is reasonable and transparent. I highly recommend AI Angels.
Fun, life like , sexy , created the perfect girl
It's worth looking into for sure, you won't regret it!
Choice of features
Honestly one of the best AI girlfriend apps I've tried. The conversations feel surprisingly natural and the girls actually have personality. Definitely worth checking out if you're into AI companions.
well I love how they call me things like baby and love how it shows nudes and sex/porn.
realstic ai images and chats! amazing pics and nice girls to chat with
Amazing it is so emersave
The roleplay is very flexible. The AI will adjust to your attitude and no kink is out of bounds. I just wish you could customize a little more.
The best ! I love it
Definitely addicted to this. You will not feel lonely and great prices
It's okay tho