Web Speech API + DocuMind API: speech-to-text for document chat
In Chromium-based browsers, Web Speech API can turn microphone input into text without hosting your own speech models.
Pair it with DocuMind’s POST /api/v1/chat (or streaming endpoint) so answers stay tied to PDFs behind your API key, not the open web.
Security note
Avoid exposing raw API keys in public front-end code for production.
Prefer a small backend proxy that attaches Authorization: Bearer …, or restrict keys and rotate them if they must live client-side during prototyping.
CORS
Browser fetch to your DocuMind origin requires CORS to allow your site’s domain—same note as How it works — HTTP API.
Optional voice: true
After SpeechRecognition produces a string, send it as message.
Set "voice": true when you want the API to return marked transcript fields for UI parity; omit it for a minimal integration.
See DocuMind API voice mode.
Speaking the reply
Use speechSynthesis with the plain reply string from JSON or the concatenated stream deltas—same text users read on screen.
Related: Voice PDF Q&A REST guide · API reference — optional voice