Speech Synthesiser using Web APIs

Speech Synthesis is the process of having a voice read some text outloud. It's what Siri does to talk back to you (or any navigation software, for that matter). It would typically require a lot of voice recording and a bit of machine learning (or a complex algorithm) to get a decent voice.

However, most browsers have already all that work done, and by using a native API it's easy to get a voice to play whatever sound you need. It's still a fairly new API, so it's wise to check for support here: Can I Use

const synth = window.speechSynthesis

const populateVoices = ()=> {
  const voices = synth.getVoices()
    .filter(voice=> voice.lang.startsWith("en"))
    .map(voice=> {
      const { name, default: isDefault } = voice
      return { name, isDefault }
    })
  setVoices(voices)
}

The first thing to get this to work is to gather the available voices from the browser. Make sure speechSynthesis is defined, otherwise this will throw an error. Also, filter the voices to work only with the language you want, as each voice is tailored to a language.

const speak = ()=> {
  if (synth.speaking) return
  const text = "..."
  const utterance = new SpeechSynthesisUtterance(text)
  utterance.voice = synth.getVoices()
    .filter(voice=> voice.lang.startsWith("en"))
    .find(voice=> voice.name === selectedVoice)
  utterance.onend = event=> console.log("Finished speaking")
  utterance.onerror = event=> console.log("Error speaking")
  synth.speak(utterance)
}

To get the sythesis to speak you need to create a utterance, a word that means "vocal sound", it's kinda appropiate, don't let the name scare you. You can configure the voice here (or let it attempt to guess the best voice for you). After setting up a few events, it's ready to mumble. Let's see how it works (remember to turn up the volume)

More concepts