Build a voice assistant for PDF Q&A with the DocuMind REST API

Updated for DocuMind · May 10, 2026

Teams often want users to ask questions aloud about policies, manuals, or contracts stored as PDFs—without turning the product into a generic internet chatbot. DocuMind answers from your uploaded documents via the same RAG pipeline as the web app; you only need an API key and HTTPS calls from your backend or browser.

Recommended flow

Capture speech in the client (e.g. browser SpeechRecognition / Web Speech API).
POST the transcript to /api/v1/chat or stream with /api/v1/chat/stream, header Authorization: Bearer ….
Optional: set "voice": true if you want marked transcript strings for your UI (see API voice mode).
Speak the reply with speechSynthesis or your preferred TTS.

Why optional `voice`?

Server integrations that only need JSON text can omit voice entirely. Voice-enabled clients set voice: true when they want message_marked / reply_marked on the response for chat bubbles that match DocuMind’s «voice» styling—purely optional.

Scope your key correctly

Create a global API key for org-wide PDFs, or a chat-scoped key when retrieval should follow one conversation’s uploads. Details are in How it works — HTTP API.

Next: Web Speech API + DocuMind · Technical overview · Try DocuMind

Recommended flow

Why optional voice?

Scope your key correctly

Why optional `voice`?