How it works?
DocuMind answers from your uploaded PDFs using retrieval (RAG). The web app and the HTTP API share the same backend—use whichever fits your workflow.
Personal Workspace
You sign in with your email (one-time code). Your account is a personal workspace: you create chats, upload PDFs, and ask questions in natural language.
Documents
- Global documents — PDFs you add under “Global Documents” are available across your account. Good for policies, manuals, or reference material you want in every chat context.
- Per-chat documents — When you upload while a chat is open, files can be tied to that chat so answers stay scoped to that conversation’s knowledge.
- Uploads are PDF only right now. Text is extracted (including OCR on sparse pages when needed) and embedded for search.
Chat
Each chat keeps its own message history. The model uses retrieved chunks from the documents that apply to that session (global and/or chat-scoped, depending on what you uploaded and how keys are scoped—see API below).
Company mode
If your organization uses company logins, DocuMind can attach users to a shared company library based on email domain (e.g. you@acme.com).
- HR uploads — Typically only an HR-style address (e.g.
hr@yourcompany.com) can upload company PDFs. Those become the knowledge base for employees at the same company. - Employees — Can read and chat against those documents; they do not replace HR uploads unless your deployment allows it.
- Visibility — Admins may control whether document counts are visible to all employees (see the sidebar options when you are logged in as HR).
HTTP API
Call POST /api/v1/chat for a full JSON reply, or POST /api/v1/chat/stream for SSE token streaming. Send your user-created API key in the header Authorization: Bearer <api_key>.
Create keys (personal only)
- In the sidebar, use API keys (global documents) for a key that only sees global uploads.
- For a key scoped to one chat, create it from the chat flow in the app (scope
chat+ chat name) so retrieval uses that chat’s document scope.
Request body
JSON object: message (string, required), optional history — an array of {"role":"user"|"assistant","content":"..."} (up to ~120 turns), and optional voice (boolean, default false).
Set "voice": true when your client uses speech (or you want the same «marked» transcript style as the web app’s voice mode); omit it for normal text-only integrations.
Responses are compact for API use; when voice is true, non-streaming JSON may include message_marked and reply_marked alongside reply.
Non-streaming: POST /api/v1/chat
# Replace YOUR_BASE_URL and YOUR_API_KEY. Response: {"reply":"..."} or {"error":"..."}. $ curl -sS -X POST "YOUR_BASE_URL/api/v1/chat" \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{"message":"Summarize my uploaded policy.","history":[]}'
# PowerShell — same endpoint; escape quotes as shown. PS> $body = '{"message":"Summarize my uploaded policy.","history":[]}' PS> Invoke-RestMethod -Uri "YOUR_BASE_URL/api/v1/chat" -Method Post ` -Headers @{ Authorization = "Bearer YOUR_API_KEY" } -Body $body -ContentType "application/json"
// Browser or Node 18+ (fetch). Use your real origin in production. $ const BASE = "YOUR_BASE_URL"; $ const key = "YOUR_API_KEY"; $ const res = await fetch(BASE + "/api/v1/chat", { $ method: "POST", $ headers: { $ "Authorization": "Bearer " + key, $ "Content-Type": "application/json", $ }, $ body: JSON.stringify({ $ message: "Summarize my uploaded policy.", $ history: [{ role: "user", content: "Hi" }, { role: "assistant", content: "Hello!" }], $ }), $ }); $ const data = await res.json(); $ console.log(data.reply);
# pip install requests $ import requests $ r = requests.post( $ "YOUR_BASE_URL/api/v1/chat", $ headers={"Authorization": "Bearer YOUR_API_KEY", "Content-Type": "application/json"}, $ json={"message": "Summarize my uploaded policy.", "history": []}, $ ) $ print(r.json())
Streaming: POST /api/v1/chat/stream
Same body and Authorization header. The response is text/event-stream. Each SSE data line is JSON: deltas {"t":"d","c":"..."}, then {"t":"done"}, or errors {"t":"e","m":"...","code":...} (still HTTP 200 in many cases—parse the event type).
# Raw SSE on stdout; use -N so curl disables buffering. $ curl -N -X POST "YOUR_BASE_URL/api/v1/chat/stream" \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -H "Accept: text/event-stream" \ -d '{"message":"Say hello in one sentence.","history":[]}'
# Windows: curl.exe streams SSE (ships with Windows 10+). PS> curl.exe -N -X POST "YOUR_BASE_URL/api/v1/chat/stream" ^ -H "Authorization: Bearer YOUR_API_KEY" ^ -H "Content-Type: application/json" ^ -d "{\"message\":\"Say hello in one sentence.\",\"history\":[]}"
// Fetch + ReadableStream: parse lines starting with "data: ". $ const BASE = "YOUR_BASE_URL"; $ const res = await fetch(BASE + "/api/v1/chat/stream", { $ method: "POST", $ headers: { $ Authorization: "Bearer YOUR_API_KEY", $ "Content-Type": "application/json", $ Accept: "text/event-stream", $ }, $ body: JSON.stringify({ message: "Say hello.", history: [] }), $ }); $ const reader = res.body.getReader(); $ const dec = new TextDecoder(); $ let buf = "", out = ""; $ while (true) { $ const { value, done } = await reader.read(); $ if (done) break; $ buf += dec.decode(value, { stream: true }); $ const lines = buf.split("\n"); $ buf = lines.pop() || ""; $ for (const line of lines) { $ const m = line.match(/^data:\s*(.+)/); $ if (!m) continue; $ const j = JSON.parse(m[1]); $ if (j.t === "d" && j.c) out += j.c; $ } $ } $ console.log(out);
# pip install requests $ import json, requests $ with requests.post( $ "YOUR_BASE_URL/api/v1/chat/stream", $ headers={"Authorization": "Bearer YOUR_API_KEY", "Content-Type": "application/json"}, $ json={"message": "Say hello.", "history": []}, $ stream=True, $ ) as r: $ for line in r.iter_lines(decode_unicode=True): $ if not line or not line.startswith("data:"): $ continue $ j = json.loads(line[5:].lstrip()) $ if j.get("t") == "d" and j.get("c"): $ print(j["c"], end="", flush=True)
Optional voice mode (embeds & speech UIs)
DocuMind does not run speech-to-text or text-to-speech on the API server for Bearer keys—you implement microphone input and playback in your app (browser Web Speech API, mobile SDKs, or cloud STT/TTS).
Send the transcribed text as message; read aloud the plain reply (or streamed deltas).
When you set "voice": true in the JSON body, the API adds optional fields so your UI can show voice-style transcripts consistent with the main web app:
- POST /api/v1/chat — same
replyas always; may also includevoice,message_marked, andreply_marked(guillemets «…»). - POST /api/v1/chat/stream — token deltas unchanged; the final
{"t":"done"}event may includevoice,message_marked, andreply_marked.
Guides: API voice mode overview, Voice assistant for PDF Q&A, Web Speech API + DocuMind.