Pick a target — any audio, video, or document. It gets converted to text through transcription or extraction, then combined with prompts to generate summaries, audio, images, music, and video.
Your target is processed into text through transcription or document extraction. The text is then combined with prompts to generate text, audio, image, music, and video outputs.
Podcast feed or episode, YouTube/Twitch/TikTok URL, MP3/MP4 file from your computer, or a direct URL to any public audio/video file.
PDF, EPUB, PPTX, or DOCX. Documents are backed up to S3 and extracted using LlamaParse or Mistral OCR vision models.
Choose from 10+ transcription services. Audio longer than 10 minutes is automatically split into segments with proper timestamp tracking and combined into a single transcript.
Identify who said what with speaker labels and timestamps. Services include HappyScribe, AssemblyAI, Deepgram Nova-3, Soniox, Rev, Gladia, ElevenLabs Scribe, Fal Whisper, and Lemonfox.
Get rapid transcripts without speaker identification using Groq Whisper Large V3 Turbo or DeepInfra Whisper. Optimized for speed when speaker labels aren't needed.
Extract text from PDFs, images, and documents using LlamaParse for multi-page documents or Mistral OCR for vision-based text extraction.
Generate summaries, chapters, FAQs, takeaways, and more using OpenAI GPT-4, Claude, Gemini, or Groq. All providers support structured JSON output with automatic retry and fallback logic.
Create short summaries (180 chars), long summaries, bullet points, key takeaways, chapters with timestamps, FAQ sections, and custom prompt outputs.
3-attempt retry logic with provider fallback. If your selected model fails, AutoShow automatically tries alternate models and providers to ensure completion.
Go beyond transcription. Generate narrated audio, cover images, original music, and video clips from your content using the latest generative AI models.
Convert summaries to narrated audio using OpenAI TTS, ElevenLabs, or Groq. Choose from multiple voices and output formats (WAV/MP3). OpenAI supports custom voice instructions.
Create cover art, thumbnails, and promotional images using OpenAI DALL-E (gpt-image-1.5), Gemini, or MiniMax. Generate 1-3 images per job with customizable dimensions and aspect ratios.
Generate original theme music with AI-written lyrics. Choose from 7 genres: Pop, Rock, Rap, Country, Folk, Jazz, Electronic. Powered by Eleven Music or MiniMax Music.
Create explainer clips, highlights, intros, outros, and social media videos. Use OpenAI Sora (4-12s), Gemini Veo (up to 4K), or MiniMax Hailuo. All prompts include safety filtering.
Content flows through a configurable pipeline. Each step can be customized with different providers and models. Optional steps are skipped if not enabled.
Audio extracted and converted to 16kHz WAV and 32k MP3. Documents backed up to S3. Video URLs passed directly to transcription.
Audio transcribed with timestamps. Long files auto-split into 10-minute segments. Documents extracted to markdown.
Dynamic prompts assembled with metadata, transcript, and selected output types. JSON schemas generated for structured output.
LLM generates structured summaries, chapters, FAQs, and more. Automatic retry with provider fallback.
Optional: Convert text output to narrated audio. Upload to S3 for persistent access.
Optional: Create AI images from title and text output. Multiple image types supported per job.
Optional: AI writes genre-specific lyrics, then composes original theme music (up to 3 minutes).
Optional: AI writes scene descriptions, then renders video clips (4-12 seconds) with thumbnails.
Built for reliability and scale with enterprise-grade infrastructure.
HappyScribe, AssemblyAI, Deepgram, Soniox, Rev, Gladia, ElevenLabs Scribe, Fal, Lemonfox, Groq Whisper, DeepInfra Whisper, Supadata, deAPI.
OpenAI (GPT-4o, GPT-4o-mini), Anthropic Claude (Sonnet, Haiku), Google Gemini (2.0 Flash, 1.5 Pro), Groq (Llama, Mixtral).
OpenAI TTS (gpt-4o-mini-tts, coral voice), ElevenLabs (eleven_flash_v2_5), Groq (canopylabs/orpheus-v1).
OpenAI DALL-E (gpt-image-1.5), Gemini (gemini-2.5-flash-image), MiniMax (image-01). Dimensions up to 1536x1024.
Eleven Music (music_v1), MiniMax Music (music-2.5). 7 genres available.
OpenAI Sora (sora-2, sora-2-pro), Gemini Veo (veo-3.1, up to 4K), MiniMax Hailuo (Hailuo-2.3). Durations 4-12 seconds.
S3-compatible storage with presigned URLs. All media automatically uploaded for persistent access. Supports Railway Storage Buckets.
LlamaParse (PDF/DOCX to markdown), Mistral OCR (vision-based extraction). Supports PDF, DOCX, PNG, JPG, TIFF, TXT.
AutoShow supports video files (MP4, MOV, AVI), audio files (MP3, WAV, M4A), YouTube URLs, streaming URLs, direct file URLs, and documents (PDF, DOCX, PNG, JPG, TIFF, TXT).
For speaker identification, use HappyScribe, AssemblyAI, Deepgram, or ElevenLabs Scribe. For fastest results without speaker labels, use Groq Whisper or DeepInfra Whisper. HappyScribe is required for YouTube and streaming URLs.
There's no hard limit. Audio longer than 10 minutes is automatically split into segments with timestamp tracking. Each segment is transcribed separately and results are combined. Very long content (3+ hours) may take several minutes to process.
OpenAI (GPT-4o, GPT-4o-mini), Anthropic Claude (Sonnet, Haiku), Google Gemini (2.0 Flash, 1.5 Pro), and Groq (Llama, Mixtral). All support structured JSON output. If your selected provider fails, AutoShow automatically falls back to alternatives.
First, an LLM generates a detailed scene description based on your content. Then, the scene is rendered using OpenAI Sora (4-12s clips), Gemini Veo (up to 4K resolution), Runway Veo (4-8s), MiniMax Hailuo, or Grok. Video types include explainer, highlight, intro, outro, and social clips.
All files are saved locally in timestamped output directories. If S3 storage is configured (Railway Storage Buckets or any S3-compatible service), media files are also uploaded with presigned URLs for persistent access.
AutoShow supports 11 genres: rap, rock, pop, country, folk, jazz, ambient, electronic, cinematic, techno, and lofi. An LLM first writes original, copyright-safe lyrics tailored to your content, then the music is composed.
Transform your content with AI transcription, summarization, and generation.
Usage based pricing - No subscriptions or hidden fees