Voice Assistant
Genie can both listen to your questions (speech-to-text) and read answers aloud (text-to-speech). Both features are off by default and require feature flags to be enabled in your environment.
Speech input (microphone)โ
When enabled, a ๐ค microphone icon appears in the chat input box (next to the send-arrow icon) once you've selected a knowledge base.
How to useโ
- Click the ๐ค microphone icon. The icon turns red to indicate recording.
- Speak your question clearly into your device's microphone.
- As you speak, your words appear in the input box.
- Click the microphone again to stop, or wait for it to auto-stop after a short silence.
- Press Enter or click the send-arrow icon to submit the recognized text.
Browser supportโ
Speech input uses the browser's built-in Web Speech API, not Azure Speech Services. This means:
- Chrome and Edge work well
- Safari has partial support
- Firefox has limited / no support depending on platform
The recognition language follows the UI locale (currently English). If your browser doesn't support speech recognition or the chosen language, you'll see an alert like:
"Speech recognition error detected: language-not-supported. Try another browser/OS."
Privacyโ
The audio is processed by the browser's speech engine, not by Genie or DHL. The audio is never uploaded to a DHL server โ only the final recognized text reaches Genie. This is the same speech engine that powers Web Speech features elsewhere on the web.
Speech output (read aloud)โ
When enabled, a ๐ volume icon appears in the toolbar below every Genie answer (alongside Copy, Lightbulb, Clipboard).
How to useโ
- Click the ๐ volume icon under any answer.
- The icon shows a loading spinner while the audio is being generated by Azure Speech Services.
- Audio plays through your default speakers / headphones.
- Click the same icon again to stop playback.
Voiceโ
The default voice is en-US-AndrewMultilingualNeural โ a high-quality neural voice that handles multiple languages reasonably well. The voice cannot be changed in the UI.
Limitsโ
- Answers shorter than 5000 characters are read aloud. Longer answers will fail.
- Generation usually takes 2โ5 seconds before playback starts.
- Audio is in MP3 format and streamed to your browser; nothing is saved to your device permanently.
Privacyโ
The answer text is sent to Azure Speech Services (Azure OpenAI region) to be converted to audio. The audio is not retained.
Why don't I see these icons?โ
If the microphone or volume icons aren't visible, the corresponding feature flag is off for your environment. Possible reasons:
- The environment hasn't been configured with Azure Speech credentials
- The flags
showSpeechInput/showSpeechOutputAzureare set tofalse - You're on a browser version that doesn't expose the API
Contact Support if you need these features enabled.
What about voice in other languages?โ
- Speech input: follows your browser locale. Some browsers let you change the recognition language; others stick to one.
- Speech output: the default voice is multilingual and pronounces non-English text reasonably, but pronunciation quality varies by language.
Genie does not currently offer Whisper-style server-side transcription or custom voice cloning.