How it works?

DocuMind answers from your uploaded PDFs using retrieval (RAG). The web app and the HTTP API share the same backend—use whichever fits your workflow.

Personal Workspace

You sign in with your email (one-time code). Your account is a personal workspace: you create chats, upload PDFs, and ask questions in natural language.

Documents

Chat

Each chat keeps its own message history. The model uses retrieved chunks from the documents that apply to that session (global and/or chat-scoped, depending on what you uploaded and how keys are scoped—see API below).

Company mode

If your organization uses company logins, DocuMind can attach users to a shared company library based on email domain (e.g. you@acme.com).

API keys are available only for personal accounts (not company-type users). Use the web UI for company libraries, or integrate via your own backend that calls DocuMind with appropriate auth if you expose that internally.

HTTP API

Call POST /api/v1/chat for a full JSON reply, or POST /api/v1/chat/stream for SSE token streaming. Send your user-created API key in the header Authorization: Bearer <api_key>.

Create keys (personal only)

Request body

JSON object: message (string, required), optional history — an array of {"role":"user"|"assistant","content":"..."} (up to ~120 turns), and optional voice (boolean, default false). Set "voice": true when your client uses speech (or you want the same «marked» transcript style as the web app’s voice mode); omit it for normal text-only integrations. Responses are compact for API use; when voice is true, non-streaming JSON may include message_marked and reply_marked alongside reply.

Non-streaming: POST /api/v1/chat

# Replace YOUR_BASE_URL and YOUR_API_KEY. Response: {"reply":"..."} or {"error":"..."}.
$ curl -sS -X POST "YOUR_BASE_URL/api/v1/chat" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"message":"Summarize my uploaded policy.","history":[]}'
# PowerShell — same endpoint; escape quotes as shown.
PS> $body = '{"message":"Summarize my uploaded policy.","history":[]}'
PS> Invoke-RestMethod -Uri "YOUR_BASE_URL/api/v1/chat" -Method Post `
  -Headers @{ Authorization = "Bearer YOUR_API_KEY" } -Body $body -ContentType "application/json"
// Browser or Node 18+ (fetch). Use your real origin in production.
$ const BASE = "YOUR_BASE_URL";
$ const key = "YOUR_API_KEY";
$ const res = await fetch(BASE + "/api/v1/chat", {
$   method: "POST",
$   headers: {
$     "Authorization": "Bearer " + key,
$     "Content-Type": "application/json",
$   },
$   body: JSON.stringify({
$     message: "Summarize my uploaded policy.",
$     history: [{ role: "user", content: "Hi" }, { role: "assistant", content: "Hello!" }],
$   }),
$ });
$ const data = await res.json();
$ console.log(data.reply);
# pip install requests
$ import requests
$ r = requests.post(
$     "YOUR_BASE_URL/api/v1/chat",
$     headers={"Authorization": "Bearer YOUR_API_KEY", "Content-Type": "application/json"},
$     json={"message": "Summarize my uploaded policy.", "history": []},
$ )
$ print(r.json())

Streaming: POST /api/v1/chat/stream

Same body and Authorization header. The response is text/event-stream. Each SSE data line is JSON: deltas {"t":"d","c":"..."}, then {"t":"done"}, or errors {"t":"e","m":"...","code":...} (still HTTP 200 in many cases—parse the event type).

# Raw SSE on stdout; use -N so curl disables buffering.
$ curl -N -X POST "YOUR_BASE_URL/api/v1/chat/stream" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -H "Accept: text/event-stream" \
  -d '{"message":"Say hello in one sentence.","history":[]}'
# Windows: curl.exe streams SSE (ships with Windows 10+).
PS> curl.exe -N -X POST "YOUR_BASE_URL/api/v1/chat/stream" ^
  -H "Authorization: Bearer YOUR_API_KEY" ^
  -H "Content-Type: application/json" ^
  -d "{\"message\":\"Say hello in one sentence.\",\"history\":[]}"
// Fetch + ReadableStream: parse lines starting with "data: ".
$ const BASE = "YOUR_BASE_URL";
$ const res = await fetch(BASE + "/api/v1/chat/stream", {
$   method: "POST",
$   headers: {
$     Authorization: "Bearer YOUR_API_KEY",
$     "Content-Type": "application/json",
$     Accept: "text/event-stream",
$   },
$   body: JSON.stringify({ message: "Say hello.", history: [] }),
$ });
$ const reader = res.body.getReader();
$ const dec = new TextDecoder();
$ let buf = "", out = "";
$ while (true) {
$   const { value, done } = await reader.read();
$   if (done) break;
$   buf += dec.decode(value, { stream: true });
$   const lines = buf.split("\n");
$   buf = lines.pop() || "";
$   for (const line of lines) {
$     const m = line.match(/^data:\s*(.+)/);
$     if (!m) continue;
$     const j = JSON.parse(m[1]);
$     if (j.t === "d" && j.c) out += j.c;
$   }
$ }
$ console.log(out);
# pip install requests
$ import json, requests
$ with requests.post(
$     "YOUR_BASE_URL/api/v1/chat/stream",
$     headers={"Authorization": "Bearer YOUR_API_KEY", "Content-Type": "application/json"},
$     json={"message": "Say hello.", "history": []},
$     stream=True,
$ ) as r:
$     for line in r.iter_lines(decode_unicode=True):
$         if not line or not line.startswith("data:"):
$             continue
$         j = json.loads(line[5:].lstrip())
$         if j.get("t") == "d" and j.get("c"):
$             print(j["c"], end="", flush=True)

Optional voice mode (embeds & speech UIs)

DocuMind does not run speech-to-text or text-to-speech on the API server for Bearer keys—you implement microphone input and playback in your app (browser Web Speech API, mobile SDKs, or cloud STT/TTS). Send the transcribed text as message; read aloud the plain reply (or streamed deltas).

When you set "voice": true in the JSON body, the API adds optional fields so your UI can show voice-style transcripts consistent with the main web app:

  • POST /api/v1/chat — same reply as always; may also include voice, message_marked, and reply_marked (guillemets «…»).
  • POST /api/v1/chat/stream — token deltas unchanged; the final {"t":"done"} event may include voice, message_marked, and reply_marked.

Guides: API voice mode overview, Voice assistant for PDF Q&A, Web Speech API + DocuMind.

CORS: If you call the API from a browser on another domain, your DocuMind server must allow that origin (FastAPI middleware / reverse proxy). From a server-side app, CORS does not apply.