Can I use external media with free APIs?

Free APIs like those from some local models or free tiers of cloud services may support external media, but most free tiers limit file size or number of requests. Local models are the best free option.

Does external media work with character cards?

Yes, but the character must be prompted to notice media. Add instructions in the character's description or example messages to ensure the AI acknowledges uploaded images or audio.

What image formats are supported?

JPEG, PNG, GIF, and WebP are supported. Animated GIFs are treated as static by most APIs. For best results, use PNG or JPEG under 5 MB.

How do I attach an image in SillyTavern?

Enable the External Media extension, then click the camera icon in the chat input bar. Select an image file from your device, and it will be attached to your next message.

Will external media work with Character.AI or other platforms?

No, external media is a SillyTavern-specific feature. Platforms like Character.AI do not support user-uploaded media in the same way. You need a SillyTavern setup with a compatible API.

Does using external media increase costs?

Yes, especially with cloud APIs. OpenAI charges per image based on resolution (e.g., 1,000 tokens per 1024x1024 image). Audio transcription also costs extra. Local models avoid per-use fees.

Can I send audio files to characters?

Yes, if you use an API with audio support like OpenAI (Whisper) or a local model with speech-to-text. The audio is transcribed, and the text enters the chat context.

How do I disable external media temporarily?

Go to Extensions > External Media and toggle the extension off. This removes the media upload button from the chat bar until you re-enable it.

SillyTavern External Media: The Honest 2026 Review

What Is External Media in SillyTavern?

External media in SillyTavern refers to the capability of attaching images, audio files, or other media to messages that are sent to the AI character. When enabled, the character's context includes metadata or descriptions of the media, allowing the AI to acknowledge and react to it. This feature is supported through several API backends, including KoboldAI, OpenAI (with vision models like GPT-4V), and Claude. The media is typically converted into a text description or base64-encoded input, depending on the API. For example, an image of a sunset can be described by the AI as "a warm orange sky over a calm ocean," and the character can incorporate that into the roleplay. External media is not enabled by default and requires manual configuration in the extension settings. It is a powerful tool for enhancing creative writing, visual storytelling, and interactive adventures, but it also increases token usage and response time.

“SillyTavern external media refers to the ability to send images, audio, or files to an AI character within a SillyTavern chat, typically via an API like KoboldAI or OpenAI. This feature allows characters to perceive and respond to user-uploaded content, enriching roleplay and storytelling.”

How to Enable External Media: Step-by-Step Setup

To enable external media in SillyTavern, first ensure you are using a compatible API. For OpenAI, you need GPT-4V or later; for KoboldAI, use a model with vision support like LLaVA. Open the SillyTavern interface and navigate to the Extensions menu (puzzle piece icon). Find the 'External Media' or 'Media Upload' extension and toggle it on. You may need to set a max file size (e.g., 5 MB) and allowed file types (png, jpg, mp3). Next, configure the API endpoint: if using OpenAI, set the model to 'gpt-4-vision-preview' and ensure your API key has access. For KoboldAI, point to your local instance running a vision-compatible model. After enabling, a media upload button (camera or paperclip icon) appears in the chat input bar. Upload an image or audio file, then send the message. The AI will process the media and generate a response referencing it. Note that some APIs charge per token for image processing, so monitor usage.

Supported Media Types and File Size Limits

SillyTavern's external media extension supports several formats: images (JPEG, PNG, GIF, WebP), audio (MP3, WAV, OGG), and in some configurations, video or PDF. File size limits depend on the API provider. For OpenAI GPT-4V, each image can be up to 20 MB, but larger images are downscaled. KoboldAI local models may have lower limits, typically 5-10 MB per file. Audio files are usually limited to 25 MB with OpenAI's Whisper integration. SillyTavern itself does not enforce hard limits, but the extension settings allow you to set a maximum upload size (default 5 MB) and block certain MIME types. For roleplay, images under 2 MB work best for quick processing. Animated GIFs are supported but may be treated as static frames by some APIs. Audio transcription uses Whisper and can handle 30-second clips. Video uploads are experimental and require conversion to frames.

Real monthly cost: Sillytavern External Media on AIAngels vs SillyTavern
Feature	AIAngels	SillyTavern
Free tier	Unlimited free text chat with all AI companions, no credit card	Limited or absent on most plans
Real monthly cost (active)	$0 or $2.99/mo annual flat	Headline price + tokens/tiers
Image generation	Included on premium	Often token-gated or per-image
Voice messages	Included on premium	Often token-gated
Memory persistence	Permanent, never resets	Often degrades after a token cap
Filter / restrictions	Uncensored for verified adults	Filter often interrupts mid-scene
Public promo code	Not needed (75% off baked in)	Rare or fake on coupon sites

Ready to Experience the
Difference?

Start chatting with a companion who actually remembers you.
Free. No tokens. No limits.

Start Chatting Free

Using External Media with Different API Backends

External media behavior varies by API backend. With OpenAI GPT-4V, images are processed natively—the model sees the image and can describe it accurately. This is ideal for visual storytelling (e.g., sharing a character portrait). KoboldAI with LLaVA or similar multimodal models also supports images, but quality depends on the model size (7B vs 13B parameters). For audio, OpenAI's Whisper transcribes speech, which then enters the chat context; the character can hear and respond. Claude (Anthropic) via proxy also supports image input. However, not all models are multimodal: if you use a text-only model, media is ignored or appended as a text link. To get the best results, choose a model explicitly designed for vision or audio. The SillyTavern wiki lists compatible models for each backend. Keep in mind that processing media consumes extra tokens—an image can cost 1,000+ tokens, so budget accordingly.

Privacy and Security Considerations for Media Uploads

Uploading external media to SillyTavern has privacy implications. If you use a cloud API like OpenAI, your images and audio are sent to their servers and processed per their privacy policy. For sensitive content, consider using a local API like KoboldAI or llama.cpp with a multimodal model, keeping all data on your machine. SillyTavern does not store media permanently; it is sent as part of the chat context and may be retained in chat logs if you save them. The extension settings allow you to disable media logging. Also, be aware that some APIs (e.g., OpenAI) may use uploaded images for model training unless you opt out. Always check the API provider's data handling policies. For maximum privacy, run everything locally with a model like LLaVA 13B, which can process images on your GPU without external transmission.

Troubleshooting Common External Media Issues

If external media fails to send, first confirm the API key has permissions for vision/audio endpoints. Check the SillyTavern console (F12) for error messages: '400 Bad Request' often means the file is too large or format unsupported. Reduce image resolution to 1024x1024 pixels or convert to JPEG. For audio, ensure it's under 25 MB and properly encoded (mono, 16kHz for Whisper). If the model ignores media, it likely lacks multimodal support—switch to GPT-4V or LLaVA. Another common issue is that the 'External Media' extension is disabled or conflicting with other extensions like 'Character Expressions'. Disable other extensions one by one to isolate the conflict. For roleplay scenarios, the AI might not respond to media if the character card doesn't include instructions to acknowledge media. Add a note in the character description: "You can see images sent by the user." Restart SillyTavern after changing settings.

Browse by tag

SillyTavern External Media in 2026: A Real Review

What Is External Media in SillyTavern?

How to Enable External Media: Step-by-Step Setup

Supported Media Types and File Size Limits

Ready to Experience the
Difference?

Using External Media with Different API Backends

Privacy and Security Considerations for Media Uploads

Troubleshooting Common External Media Issues

Stop starting from scratch.

Frequently Asked Questions

Explore More

What our customers are saying

Browse by tag

What Is External Media in SillyTavern?

How to Enable External Media: Step-by-Step Setup

Supported Media Types and File Size Limits

Ready to Experience the Difference?

Using External Media with Different API Backends

Privacy and Security Considerations for Media Uploads

Troubleshooting Common External Media Issues

Stop starting from scratch.

Frequently Asked Questions

Explore More

Ready to Experience the
Difference?