Skip to main content
Version: 8.9 (unreleased)

Document extraction

Extract data from a single type of structured or unstructured document.

About document extraction

Document extraction templates form the basis for using IDP in your end-to-end processes.

  • Create a separate document extraction template for each type of document you want to categorize and extract data from, such as an invoice, a report, identity document, and so on.
  • Published document extraction templates can then be used to integrate IDP into your processes.

Create document extraction template

To create a new document extraction template:

  1. In your IDP application, click Create extraction project to open the Create new project modal.

    Create an extraction project modal
  2. Select the Extraction method depending on whether your documents contain structured or unstructured data.

    • Unstructured data extraction: Extract data from unstructured documents.
    • Structured form extraction: Extract data from structured documents.
  3. Name: Enter a descriptive name for the type of document, such as “Invoice type A” for example.

  4. Description: Enter a description to provide more detailed information about the document type.

  5. Provider: Select the cloud provider you want to use for document extraction. The available providers depend on the connector secrets configured for your cluster.

    Provider selection dropdown

    The four supported providers are:

    • AWS: Amazon Web Services with Bedrock and Textract (supports both structured and unstructured extraction)
    • Azure: Microsoft Azure with AI Document Intelligence and AI Foundry (unstructured extraction only)
    • GCP: Google Cloud Platform with Vertex AI and Document AI (supports both structured and unstructured extraction)
    • OpenAI compatible: Any provider that implements the OpenAI /chat/completions API (unstructured extraction only)
    note

    If the connector secrets for a specific provider are missing from your cluster configuration, that provider will be unavailable for selection. To enable additional providers, configure the required connector secrets as described in the IDP configuration guide.

  6. Click Create to create and open the new document extraction template.

  7. Configure and publish the template:

tip

Not sure which extraction method to use? See structured and unstructured documents to help determine what type of document(s) you will be processing.