Skip to main content

Voice Assistant

Genie can both listen to your questions (speech-to-text) and read answers aloud (text-to-speech). Both features are off by default and require feature flags to be enabled in your environment.

Speech input (microphone)โ€‹

When enabled, a ๐ŸŽค microphone icon appears in the chat input box (next to the send-arrow icon) once you've selected a knowledge base.

How to useโ€‹

  1. Click the ๐ŸŽค microphone icon. The icon turns red to indicate recording.
  2. Speak your question clearly into your device's microphone.
  3. As you speak, your words appear in the input box.
  4. Click the microphone again to stop, or wait for it to auto-stop after a short silence.
  5. Press Enter or click the send-arrow icon to submit the recognized text.

Browser supportโ€‹

Speech input uses the browser's built-in Web Speech API, not Azure Speech Services. This means:

  • Chrome and Edge work well
  • Safari has partial support
  • Firefox has limited / no support depending on platform

The recognition language follows the UI locale (currently English). If your browser doesn't support speech recognition or the chosen language, you'll see an alert like:

"Speech recognition error detected: language-not-supported. Try another browser/OS."

Privacyโ€‹

The audio is processed by the browser's speech engine, not by Genie or DHL. The audio is never uploaded to a DHL server โ€” only the final recognized text reaches Genie. This is the same speech engine that powers Web Speech features elsewhere on the web.

Speech output (read aloud)โ€‹

When enabled, a ๐Ÿ”Š volume icon appears in the toolbar below every Genie answer (alongside Copy, Lightbulb, Clipboard).

How to useโ€‹

  1. Click the ๐Ÿ”Š volume icon under any answer.
  2. The icon shows a loading spinner while the audio is being generated by Azure Speech Services.
  3. Audio plays through your default speakers / headphones.
  4. Click the same icon again to stop playback.

Voiceโ€‹

The default voice is en-US-AndrewMultilingualNeural โ€” a high-quality neural voice that handles multiple languages reasonably well. The voice cannot be changed in the UI.

Limitsโ€‹

  • Answers shorter than 5000 characters are read aloud. Longer answers will fail.
  • Generation usually takes 2โ€“5 seconds before playback starts.
  • Audio is in MP3 format and streamed to your browser; nothing is saved to your device permanently.

Privacyโ€‹

The answer text is sent to Azure Speech Services (Azure OpenAI region) to be converted to audio. The audio is not retained.

Why don't I see these icons?โ€‹

If the microphone or volume icons aren't visible, the corresponding feature flag is off for your environment. Possible reasons:

  • The environment hasn't been configured with Azure Speech credentials
  • The flags showSpeechInput / showSpeechOutputAzure are set to false
  • You're on a browser version that doesn't expose the API

Contact Support if you need these features enabled.

What about voice in other languages?โ€‹

  • Speech input: follows your browser locale. Some browsers let you change the recognition language; others stick to one.
  • Speech output: the default voice is multilingual and pronounces non-English text reasonably, but pronunciation quality varies by language.

Genie does not currently offer Whisper-style server-side transcription or custom voice cloning.