Build a voice assistant for PDF Q&A with the DocuMind REST API
Teams often want users to ask questions aloud about policies, manuals, or contracts stored as PDFs—without turning the product into a generic internet chatbot. DocuMind answers from your uploaded documents via the same RAG pipeline as the web app; you only need an API key and HTTPS calls from your backend or browser.
Recommended flow
- Capture speech in the client (e.g. browser
SpeechRecognition/ Web Speech API). - POST the transcript to
/api/v1/chator stream with/api/v1/chat/stream, headerAuthorization: Bearer …. - Optional: set
"voice": trueif you want marked transcript strings for your UI (see API voice mode). - Speak the reply with
speechSynthesisor your preferred TTS.
Why optional voice?
Server integrations that only need JSON text can omit voice entirely.
Voice-enabled clients set voice: true when they want message_marked / reply_marked on the response for chat bubbles that match DocuMind’s «voice» styling—purely optional.
Scope your key correctly
Create a global API key for org-wide PDFs, or a chat-scoped key when retrieval should follow one conversation’s uploads. Details are in How it works — HTTP API.
Next: Web Speech API + DocuMind · Technical overview · Try DocuMind