Generate

Get API Key

Your Voice-overs

Your recordings will appear here.

Kokoro TTS Studio: Free Online Text-to-Speech Demo

Welcome to Kokoro TTS Studio powered by Unreal Speech - the ultimate playground for the revolutionary 82M parameter open-source text-to-speech engine! Simply type your text, choose from our extensive library of 48 natural-sounding voices across 8 languages, and instantly generate high-quality speech that rivals premium commercial services. Convert text to speech in your browser, right now and download your audio files for free.

What is Kokoro TTS: The Free Open-Source Speech Generator

Kokoro TTS is a groundbreaking open-source text-to-speech model that's revolutionizing the AI voice landscape. With a remarkably tiny footprint of just 82 million parameters (a fraction of what other models use), Kokoro delivers astonishingly natural speech synthesis that outperforms models 5-15× its size in both quality and speed.

Why Kokoro TTS Is Making Waves in the AI Community

Incredibly Compact Yet Powerful: At just 82M parameters, Kokoro is dramatically smaller than competing models like XTTS v2 (467M) and MetaVoice (1.2B), yet produces equal or better voice quality
Lightning-Fast Performance: Generates speech up to 210× real-time on GPU and 3-11× real-time on CPU—making it perfect for real-time applications and batch processing
Resource-Efficient Design: Runs smoothly on consumer hardware without requiring expensive cloud infrastructure or specialized equipment
Truly Open Source & Free: Licensed under Apache 2.0, allowing both commercial and non-commercial use with no restrictions
Award-Winning Quality: Achieved 1st place in the HuggingFace TTS Spaces Arena for single-speaker speech quality, beating models many times its size
Multilingual Support: Speaks multiple languages fluently, including English (US/UK), French, Hindi, Spanish, Japanese, Chinese, Italian, and Portuguese

Try the live demo above to experience the remarkable quality yourself! Simply type your text above, select a voice, and click "Generate" to hear Kokoro TTS in action.

Explore 48 Voices Across 8 Languages

Kokoro TTS Studio supports English (US/UK), French, Hindi, Spanish, Japanese, Chinese, Italian, and Portuguese. Browse our extensive library of realistic voices spanning multiple languages and accents. Each voice has been carefully trained to deliver natural intonation and clarity:

Display Name	Voice ID	Language	Gender	Description
Hannah	af_bella	English (US)	Female	American female with reserved personality
Kaitlyn	af_nicole	English (US)	Female	American female with whisper-like voice, looks casual
Lauren	af_sarah	English (US)	Female	American female, probably an educator; confident
Sierra	af_sky	English (US)	Female	American female with high level of composure
Noah	am_adam	English (US)	Male	American male with confident personality
Daniel	am_michael	English (US)	Male	American male with confident personality
Chloe	bf_emma	English (UK)	Female	British female
Amelia	bf_isabella	English (UK)	Female	British female with calm personality
Edward	bm_george	English (UK)	Male	British male, mature voice
Oliver	bm_lewis	English (UK)	Male	British male with confident personality
Élodie	ff_siwis	French	Female	Young French female voice
Ananya	hf_alpha	Hindi	Female	Young Hindi female voice
Priya	hf_beta	Hindi	Female	Young Hindi female voice
Arjun	hm_omega	Hindi	Male	Young Hindi male voice
Sakura	jf_alpha	Japanese	Female	Young Japanese female voice
Hana	jf_gongitsune	Japanese	Female	Young Japanese female voice
Haruto	jm_kumo	Japanese	Male	Young Japanese male voice
Lucía	ef_dora	Spanish	Female	Young Spanish female voice
Mateo	em_alex	Spanish	Male	Young Spanish male voice
Giulia	if_sara	Italian	Female	Young Italian female voice
Luca	im_nicola	Italian	Male	Young Italian male voice
Mei	zf_xiaobei	Chinese	Female	Young Chinese female voice
Lian	zf_xiaoni	Chinese	Female	Young Chinese female voice
Wei	zm_yunjian	Chinese	Male	Young Chinese male voice
Camila	pf_dora	Portuguese	Female	Young Portuguese female voice
Thiago	pm_alex	Portuguese	Male	Young Portuguese male voice

And many more voices available! This demo features our most popular voices, with additional options continuously being added.

How Kokoro TTS Works: The Technical Breakthrough

Kokoro achieves its remarkable efficiency through a revolutionary architectural design that combines the best elements of StyleTTS 2 and iSTFTNet in a decoder-only approach:

Innovative Architecture

Hybrid Design: Merges StyleTTS 2's transformer-based decoder with iSTFTNet's efficient vocoder for optimal quality-to-size ratio
Decoder-Only Architecture: Eliminates the computationally expensive diffusion-based style modeling and separate text encoders that other models require
Streamlined Waveform Generation: Uses iSTFTNet for fast and efficient audio synthesis without quality compromise
High-Quality Training Data: Trained exclusively on carefully curated, permissive/non-copyrighted audio data focused on long-form narration

This innovative approach enables Kokoro to generate 24kHz high-fidelity audio with minimal computational resources, redefining what's possible in open-source text-to-speech technology.

Comparison with Traditional TTS Architectures

Unlike classical TTS models such as Tacotron 2 (which uses slow attention-based mel generation and requires a separate vocoder) or FastSpeech 2 (which relies on a two-stage pipeline and teacher-forced alignments), Kokoro's streamlined architecture generates speech in one efficient pass.

By removing diffusion processes and autoregressive bottlenecks, Kokoro achieves superior speed without sacrificing quality. This makes it uniquely positioned for both real-time applications and batch processing of long-form content.

Why Choose Kokoro TTS?

⚡ Unmatched Performance and Efficiency

Kokoro TTS delivers remarkable speed that makes it perfect for real-time applications and large-scale content production:

Outstanding GPU Performance: ~210× real-time on high-end GPUs (RTX 4090), ~90× real-time on consumer GPUs like the 3090 Ti
Impressive CPU Performance: 3-11× real-time on modern CPUs, making it viable even without dedicated graphics hardware
Ultra-Low Latency: Synthesizes typical sentences in just 40-70ms on GPU, enabling truly interactive applications
Exceptional Throughput: Handles 500+ simultaneous requests with response times around 2 seconds, ideal for high-traffic services

💸 Cost-Effective Alternative to Premium Services

Free and Open Alternative to ElevenLabs: Achieve professional-grade voice synthesis without expensive subscription fees
No Per-Character Pricing: Generate unlimited audio without worrying about usage-based pricing models
Local Processing Option: Run entirely on your own hardware without relying on internet connectivity or cloud services
Fully Commercial-Ready: Apache 2.0 license permits unrestricted use in commercial products and services

🎯 Perfect For a Wide Range of Applications

Content Creation: Generate professional voiceovers for videos, podcasts, YouTube content, and social media
Audiobook Production: Convert ebooks, articles, and long-form content to engaging audio in minutes instead of hours
Gaming & VR: Add dynamic voice lines to games and virtual reality experiences with minimal latency
Accessibility Tools: Build screen readers and assistive technology that sounds natural and engaging
Voice Assistants & Chatbots: Create responsive AI interfaces with human-like speech capabilities
E-Learning & Education: Develop engaging learning materials with clear, natural audio narration
IVR & Telephony Systems: Improve customer experience with natural-sounding automated phone systems
Localization & Dubbing: Translate and voice content across multiple languages efficiently

What Users Are Saying About Kokoro TTS

Kokoro TTS has garnered enthusiastic praise from developers, content creators, and AI enthusiasts across the community:

"This thing is crazy for 82M! I generated a six-hour audiobook from a full book in just four minutes. The consistency is incredible across long texts."

"Kokoro is the best open-source TTS I've used... really tiny, so with the right hardware it's really fast. Voice quality rivals commercial services I've paid for."

"I've tried Fastspeech, VoiceCraft, Coqui before... these required chunking input into short pieces and post-processing to remove pauses. Kokoro just works on long texts without too many issues."

"Voice is pleasant and for long texts it reads in a very stable manner without odd pauses or glitches. It's become my go-to for all my content creation needs."

"As someone building accessibility tools, Kokoro has been a game-changer. The voices sound natural enough that my users actually enjoy listening to them, unlike the robotic alternatives."

How to Use Kokoro TTS in Your Projects

Want to integrate Kokoro TTS into your own applications? You have several flexible options:

1. Unreal Speech API (Fastest & Easiest)

For production-ready implementation with minimal setup, use the Unreal Speech API powered by Kokoro TTS.

Note: You'll need to sign in to get a free API key, which you'll then find on your dashboard. See the API docs for more info.


# Endpoint: /stream
# - Convert up to 1,000 characters ASAP
# - Synchronous, instant response (0.3s)
# - Streams back raw audio data (no timestamps)

import requests

response = requests.post(
  'https://api.v8.unrealspeech.com/stream',
  headers = {
    'Authorization' : 'Bearer YOUR_API_KEY'
  },
  json = {
    'Text': '''Your text goes here''', # Up to 1,000 characters
    'VoiceId': 'af_bella', # Choose from available voice IDs
    'Bitrate': '192k', # 320k, 256k, 192k, ...
    'Speed': '0', # -1.0 to 1.0
    'Pitch': '1', # 0.5 to 1.5
    'Codec': 'libmp3lame', # libmp3lame or pcm_mulaw
  }
)

with open('audio.mp3', 'wb') as f:
    f.write(response.content)

Why Choose Unreal Speech API:

11× cheaper than ElevenLabs
Stream audio in just 300ms
Request up to 10-hour audio files
Includes per-word timestamps
Production-ready infrastructure
No need to manage your own hardware

2. Python Implementation (Self-Hosted)

For those who prefer to run Kokoro locally or in their own infrastructure:


from kokoro import pipeline

# Create the TTS pipeline
audio_generator = pipeline(
    "This is a demonstration of the Kokoro TTS system, which produces remarkably natural speech from a compact 82 million parameter model.",
    voice="af_bella",
    speed=1.0
)

# Process the generated audio
for _, _, audio in audio_generator:
    with open("kokoro_demo.wav", "wb") as f:
        f.write(audio)

3. Command Line Usage

For quick generation from the terminal:


kokoro-tts -v af_bella "Hello, this is Kokoro speaking. I'm a compact but powerful text-to-speech system." -o output.wav

System Requirements

Kokoro TTS is remarkably efficient, making it accessible on a wide range of hardware:

CPU: Modern multi-core CPU for real-time speeds (8+ cores recommended for optimal performance)
GPU: Even mid-range cards like GTX 1060 (6GB) can handle Kokoro efficiently, with high-end cards achieving 100-200× real-time speeds
Memory: ~2GB RAM for model and audio processing (more for handling very long texts)
Disk Space: ~350MB for model plus a few MB for voice files
Supported Platforms: Windows, macOS, Linux, and cloud environments

Kokoro TTS vs. Other Text-to-Speech Models

See how Kokoro compares to other popular TTS solutions:

Feature	Kokoro TTS	XTTS v2	MetaVoice	ElevenLabs
Model Size	82M parameters	467M parameters	1.2B parameters	Proprietary
Speed	Up to 210× real-time	~30× real-time	~20× real-time	Cloud-based
Local Deployment	✅ Yes	✅ Yes	✅ Yes	❌ No
Quality Ranking	1st in TTS Spaces Arena	Lower ranked	Lower ranked	High quality
Commercial License	✅ Apache 2.0	❌ Restricted	❌ Restricted	❌ Paid service
Voice Cloning	❌ No (without fine-tuning)	✅ Yes	✅ Yes	✅ Yes
Cost	Free (open-source)	Free	Free	Subscription-based
Long-form Handling	✅ Excellent	⚠️ Requires chunking	⚠️ Variable quality	✅ Good
Resource Usage	✅ Very low	⚠️ Moderate	❌ High	N/A (cloud)
Multilingual	✅ 8+ languages	✅ Multiple	✅ Multiple	✅ 29+ languages

Technical Deep Dive: What Makes Kokoro Special

For those interested in the technical details, Kokoro's success stems from several key innovations:

Efficient Architecture

Kokoro's hybrid model combines the strengths of StyleTTS 2 and iSTFTNet while eliminating their inefficiencies. By removing the diffusion-based style modeling of StyleTTS2 and using iSTFTNet for efficient waveform generation, Kokoro dramatically reduces complexity while preserving quality.

Unlike traditional two-stage TTS pipelines (text-to-spec followed by vocoding), Kokoro streamlines the process with a transformer-based decoder that can directly produce audio features with integrated vocoding. This avoids Tacotron's alignment issues and slow iterative output.

Benchmarks & Performance

Kokoro has proven its merit in head-to-head evaluations, achieving 1st place in the HuggingFace TTS Spaces Arena for single-speaker speech quality. Listeners consistently ranked Kokoro's output above much larger models in blind tests.

In Elo-style comparisons of naturalness, Kokoro-82M emerged as a top model, even beating systems trained on vastly more data. For example, "Fish Speech" (trained on ~1 million hours) failed to match Kokoro's naturalness, despite Kokoro being trained on <100 hours of curated data.

Training Efficiency

Kokoro's training process was remarkably cost-effective, requiring only ~500 GPU hours on A100 hardware (approximately $400). This efficiency demonstrates that with the right architecture and high-quality data, smaller models can achieve state-of-the-art results.

Limitations and Future Improvements

While Kokoro TTS is impressive, we believe in transparency about its current limitations:

Limited Expressiveness: Speech can sound somewhat neutral in emotional range compared to professional voice actors
No Built-in Voice Cloning: Cannot mimic new voices without fine-tuning (unlike some commercial options)
Multilingual Quality Variations: While supporting multiple languages, quality may vary across non-English languages
Short Input Quirks: Performs best with longer texts rather than single words or very short phrases

The Kokoro community is actively working on addressing these limitations in future updates, with plans for more expressive models and improved voice variety.

Get Started with Kokoro TTS Today

Try our live demo above and experience the future of open-source text-to-speech technology. With Kokoro TTS, you can generate professional-quality voiceovers, create accessible content, and build voice-enabled applications without breaking the bank.

Ready for Production Use?

For production-ready API access with enterprise reliability, ultra-fast response times, and cost-effective pricing, check out Unreal Speech - the premium Kokoro-powered TTS API that's:

11× cheaper than ElevenLabs
Streams audio in just 300ms
Supports requests up to 10 hours long
Includes precise per-word timestamps
Backed by enterprise-grade infrastructure

Get Started for Free →

Frequently Asked Questions About Kokoro TTS

What makes Kokoro TTS different from other text-to-speech services?

Kokoro TTS stands out for its remarkable efficiency—achieving professional-quality speech with just 82 million parameters (compared to models 5-15× larger). This lightweight design enables fast processing through our API while still outperforming much larger models in quality benchmarks. Our online demo lets you experience Kokoro's capabilities instantly and download the generated MP3s. Unlike most commercial services, the underlying Kokoro model is open-source under the Apache 2.0 license, while our Unreal Speech API provides a production-ready implementation with affordable pricing.

Which languages and voices does Kokoro TTS support?

Kokoro TTS currently offers 48 voices across 8 languages. You can generate speech in American English, British English, French, Hindi, Spanish, Japanese, Chinese, and Portuguese. Each language includes multiple male and female voices with different characteristics and speaking styles. The voice selection is constantly expanding, with regular updates adding new options and improving existing ones.

Can I download and use the generated speech files for my projects?

Yes, all audio generated by Kokoro TTS Studio can be freely downloaded as MP3 files and used in both personal and commercial projects. You can use these audio files for YouTube videos, podcasts, e-learning content, audiobooks, or any other application. The following terms apply, based on your subscription plan:

Free plan – You must attribute Unreal Speech by including a link to "unrealspeech.com" in the description.
Paid plan – You do not need to include any attribution.

How do I get the best quality results from Kokoro TTS?

For optimal results with Kokoro TTS, use longer sentences or paragraphs rather than single words (the model performs better with context). Include proper punctuation to help with natural pausing and intonation. Experiment with different voices—some may pronounce certain words or phrases more naturally than others depending on your text. For professional applications requiring even higher quality or custom voices, consider Unreal Speech's API which builds upon Kokoro's technology with enterprise-grade reliability.

Can I run Kokoro TTS offline on my own computer?

Yes, Kokoro TTS can be installed and run locally on your computer without an internet connection. The model is small enough (about 350MB) to run efficiently on most modern computers, even without a dedicated GPU. For local installation, you can use the Python implementation (pip install kokoro) or command-line tools. This makes Kokoro ideal for privacy-conscious users, offline applications, or scenarios where consistent generation without reliance on external services is important.