← Blog

Build a voice assistant for PDF Q&A with the DocuMind REST API

Teams often want users to ask questions aloud about policies, manuals, or contracts stored as PDFs—without turning the product into a generic internet chatbot. DocuMind answers from your uploaded documents via the same RAG pipeline as the web app; you only need an API key and HTTPS calls from your backend or browser.

Recommended flow

  1. Capture speech in the client (e.g. browser SpeechRecognition / Web Speech API).
  2. POST the transcript to /api/v1/chat or stream with /api/v1/chat/stream, header Authorization: Bearer ….
  3. Optional: set "voice": true if you want marked transcript strings for your UI (see API voice mode).
  4. Speak the reply with speechSynthesis or your preferred TTS.

Why optional voice?

Server integrations that only need JSON text can omit voice entirely. Voice-enabled clients set voice: true when they want message_marked / reply_marked on the response for chat bubbles that match DocuMind’s «voice» styling—purely optional.

Scope your key correctly

Create a global API key for org-wide PDFs, or a chat-scoped key when retrieval should follow one conversation’s uploads. Details are in How it works — HTTP API.

Next: Web Speech API + DocuMind · Technical overview · Try DocuMind