Text to Speech

Text to Speech

Convert text to speech using your browser. Listen to text in multiple languages and voices. Adjust speed and pitch

Browser-native text-to-speech (Web Speech API) means you can convert text to spoken audio without uploading anything — useful for proofreading by ear, generating audio versions of articles, or accessibility testing. The quality varies by OS: macOS Siri and Apple voices are excellent; Windows Edge's Microsoft voices are good; older Linux/Android voices are robotic. This tool supports all available voices on your system, adjustable rate/pitch, and SSML markup for fine control over emphasis and pronunciation.

What the browser actually does

Web Speech API (SpeechSynthesis interface) is supported in every modern browser. It delegates to the OS's TTS engine: macOS uses VoiceOver and Siri voices; Windows uses SAPI / Edge's neural voices; Linux uses espeak / festival; Android uses Google TTS; iOS uses Siri voices. Quality varies enormously: a 2024 Microsoft Edge neural voice sounds nearly human; a 2010 espeak voice sounds like a 1980s synthesizer.

The tool calls `speechSynthesis.speak(new SpeechSynthesisUtterance(text))`. Your text never goes over the network — the OS synthesizes locally (some neural voices download once, then cache). Useful for confidential text you want to hear read aloud without using a cloud service.

Working example

Input

Text: "Welcome to eutils.pro. The platform supports 146 tools."
Voice: Microsoft Aria (en-US neural)
Rate: 1.0 (default)
Pitch: 1.0 (default)

Output

Synthesis: instant (local).
Duration: ~5.5 seconds at default rate.

Observations:
  - "eutils.pro" pronounced as "ee-yoo-tils dot pro" — domain name guessed.
    Use SSML to override: <phoneme alphabet="ipa" ph="juːtɪlz">eutils</phoneme>.
  - "146" pronounced as "one hundred forty-six".
  - Stress on natural sentence boundaries; no emphasis specified.

For better pronunciation control:
  <speak>
    Welcome to <phoneme alphabet="ipa" ph="juːtɪlz">eutils</phoneme>.pro.
    The platform supports <say-as interpret-as="cardinal">146</say-as> tools.
  </speak>

SSML support varies by voice. Microsoft neural voices accept it; basic espeak voices may ignore most tags. For consistent SSML behavior across platforms, use a cloud TTS service (Azure, Polly, Google Cloud TTS).

Use cases that work and do not work

  • Proofreading — works. Hearing your text spoken catches typos and clunky phrasing your eyes skim past.
  • Accessibility testing — partial. Your browser's TTS shows what audio approximates, but a real screen reader user has different pacing/voice preferences.
  • Generating audio versions of blog posts — works for personal use; for published audio, quality of free voices is below podcast-grade.
  • Language learning — partial. Modern neural voices have decent prosody; older voices mispronounce non-English words badly.
  • Reading email aloud — works. macOS, Windows, and most mobile platforms have system-level "Read aloud" that uses the same TTS.
  • Translating text to speech — TTS does not translate. Translate first (DeepL, Google Translate), then TTS the result with a matching-language voice.

Voices: free vs cloud

  • Browser/OS native — free, no quota, offline. Quality varies from excellent (Edge neural) to robotic (older espeak). Limited to system-installed voices.
  • Microsoft Azure Neural TTS — best free-tier (500K chars/month). 400+ voices in 140+ languages. SSML fully supported. Paid usage cheap.
  • Amazon Polly — competitive with Azure. AWS pricing model.
  • Google Cloud TTS — Wavenet voices are excellent for English; other languages improving.
  • ElevenLabs — high-end voice cloning. Premium pricing. Used for audiobook production, podcast intros.
  • OpenAI TTS — fast, multilingual, integrated with their other models. Good for chatbot voices.

When to reach for this tool

  • You wrote something long and want to proofread by hearing it.
  • You want to test how your content sounds to a screen reader user (caveat: real assistive tech has user-tuned voices and rates).
  • You are a non-native English speaker and want to hear correct pronunciation of unfamiliar words.
  • You are creating quick audio prototypes for a voice interface; cloud TTS will deliver production quality later.

What this tool will not do

  • It will not produce broadcast-quality audio. Browser TTS is good for casual use; professional voiceover requires a real voice actor or premium cloud TTS.
  • It will not save audio files automatically (depends on the implementation; some let you record the synthesized audio via MediaRecorder).
  • It will not work without an OS-installed voice. If your Linux box has no TTS engine installed, the API silently does nothing.
  • It will not match the voice of a specific person without cloning tools (ElevenLabs, Resemble.ai), which raise ethical and legal concerns.

Synthesis runs on your OS's TTS engine. Text is not transmitted to a cloud service unless your browser/OS explicitly uses cloud voices (Edge's "natural voices" download neural models once; subsequent synthesis is local).

Frequently asked questions

Why do voices sound different on different devices?

Each OS ships its own TTS engine and voice library. macOS Siri voices, Windows Edge neural voices, Linux espeak — all different. A web page using the Web Speech API uses whichever the current OS provides; cross-device consistency requires cloud TTS where the same engine serves all clients.

Can I download the synthesized audio?

Some tools yes, some no. Web Speech API does not directly expose the audio bytes — workaround is to capture the speaker output via MediaRecorder + Web Audio API. Cloud TTS services return audio files directly.

Why does TTS mispronounce names?

TTS engines learn pronunciation from training data, mostly common words. Proper nouns, foreign words, technical jargon get phonetic guessing. Use SSML <phoneme> tags to override (works in neural cloud voices; not in older OS voices).

Is TTS getting better fast?

Yes. Neural TTS (2020+) is dramatically better than concatenative TTS (2000s) or formant synthesis (1990s). 2024+ voices like Microsoft Aria, ElevenLabs are nearly indistinguishable from human voices for short utterances. Long-form (audiobooks) still betrays the synthesizer.

Can TTS read in any language?

Most languages have at least one TTS voice in major OSes; some languages (English, Spanish, French, German, Mandarin) have dozens. Less-common languages (Welsh, Swahili, Tibetan) may have only one robotic voice or none. Cloud TTS supports more languages than OS-installed voices.

Does TTS work offline?

OS-installed TTS engines work offline. Cloud TTS does not. Neural OS voices (Edge's "natural voices") download the model once, then run offline; check your OS's TTS settings to verify.

Related tools

Published · Updated · E-Utils editorial team