← Blog

Web Speech API + DocuMind API: speech-to-text for document chat

In Chromium-based browsers, Web Speech API can turn microphone input into text without hosting your own speech models. Pair it with DocuMind’s POST /api/v1/chat (or streaming endpoint) so answers stay tied to PDFs behind your API key, not the open web.

Security note

Avoid exposing raw API keys in public front-end code for production. Prefer a small backend proxy that attaches Authorization: Bearer …, or restrict keys and rotate them if they must live client-side during prototyping.

CORS

Browser fetch to your DocuMind origin requires CORS to allow your site’s domain—same note as How it works — HTTP API.

Optional voice: true

After SpeechRecognition produces a string, send it as message. Set "voice": true when you want the API to return marked transcript fields for UI parity; omit it for a minimal integration. See DocuMind API voice mode.

Speaking the reply

Use speechSynthesis with the plain reply string from JSON or the concatenated stream deltas—same text users read on screen.

Related: Voice PDF Q&A REST guide · API reference — optional voice