Web Speech API + DocuMind API: speech-to-text for document chat

Updated for DocuMind · May 10, 2026

In Chromium-based browsers, Web Speech API can turn microphone input into text without hosting your own speech models. Pair it with DocuMind’s POST /api/v1/chat (or streaming endpoint) so answers stay tied to PDFs behind your API key, not the open web.

Security note

Avoid exposing raw API keys in public front-end code for production. Prefer a small backend proxy that attaches Authorization: Bearer …, or restrict keys and rotate them if they must live client-side during prototyping.

CORS

Browser fetch to your DocuMind origin requires CORS to allow your site’s domain—same note as How it works — HTTP API.

Optional `voice: true`

After SpeechRecognition produces a string, send it as message. Set "voice": true when you want the API to return marked transcript fields for UI parity; omit it for a minimal integration. See DocuMind API voice mode.

Speaking the reply

Use speechSynthesis with the plain reply string from JSON or the concatenated stream deltas—same text users read on screen.

Security note

CORS

Optional voice: true

Speaking the reply

Optional `voice: true`