Skip to main content
Version: 8.10 (unreleased)

AI Agent document support

The AI Agent connector can pass Camunda documents to the LLM from two sources:

In both cases, supported documents are resolved and passed to the LLM as native content blocks so the model can interpret them directly.

Supported document types

Because file type support varies by LLM provider and model, you must test your document use case with the provider you are using.

File typeSupportedDescription
TextYesText files (MIME types matching text/*, application/xml, application/json, or application/yaml) are passed as plain text content blocks.
PDFYesPDF files (MIME types matching application/pdf) are passed as base64 encoded content blocks.
ImageYesImage files (MIME types matching image/jpeg, image/png, image/gif, or image/webp) are passed as base64 encoded content blocks.
Audio/video/otherNoAudio and video files are not currently supported, and will result in an error if passed. All other unsupported file types not listed here will also result in an error if passed.
info

To learn more about storing, tracking, and managing documents in Camunda 8, see document handling.

Documents in the user prompt

Use the user prompt Documents field to add a list of document references the agent can interact with. The list is internally resolved and passed to the LLM if the document type is supported.

LLM APIs allow the user prompt to be specified as a list of content blocks. Each supported document reference is resolved to a corresponding content block and passed as part of the user message. For examples of how LLM providers accept document content blocks, refer to the Anthropic and OpenAI documentation.

Documents in tool call results

Tool call responses can contain document references nested anywhere within the result structure.

The agent extracts these documents from the tool call result and passes them to the LLM as native content blocks (plain text for text files, base64 encoded content for PDFs and images). This is the same mechanism used for user prompt documents.

In the conversation, the tool call result itself retains a lightweight document reference (for example, the document ID and store, or an external URL). The resolved document content is delivered in a separate follow-up user message immediately after the tool result, allowing the model to correlate each reference with its content.

For example, a tool can return a document for the LLM to analyze:

  • A REST connector tool with the Store response option enabled downloads a PDF document.
  • A user task tool with a Filepicker form lets a person upload a document as part of a human-in-the-loop workflow.