Document extraction
Extract data from a single type of structured or unstructured document.
About document extraction
Document extraction templates form the basis for using IDP in your end-to-end processes.
- Create a separate document extraction template for each type of document you want to categorize and extract data from, such as an invoice, a report, identity document, and so on.
- Published document extraction templates can then be used to integrate IDP into your processes.
Create document extraction template
To create a new document extraction template:
-
In your IDP application, click Create extraction project to open the Create new project modal.
-
Select the Extraction method depending on whether your documents contain structured or unstructured data.
- Unstructured data extraction: Extract data from unstructured documents.
- Structured form extraction: Extract data from structured documents.
-
Name: Enter a descriptive name for the type of document, such as “Invoice type A” for example.
-
Description: Enter a description to provide more detailed information about the document type.
-
Provider: Select the cloud provider you want to use for document extraction. The available providers depend on the connector secrets configured for your cluster.
The four supported providers are:
- AWS: Amazon Web Services with Bedrock and Textract (supports both structured and unstructured extraction)
- Azure: Microsoft Azure with AI Document Intelligence and AI Foundry (unstructured extraction only)
- GCP: Google Cloud Platform with Vertex AI and Document AI (supports both structured and unstructured extraction)
- OpenAI compatible: Any provider that implements the OpenAI
/chat/completions
API (unstructured extraction only)
noteIf the connector secrets for a specific provider are missing from your cluster configuration, that provider will be unavailable for selection. To enable additional providers, configure the required connector secrets as described in the IDP configuration guide.
-
Click Create to create and open the new document extraction template.
-
Configure and publish the template:
- Extract unstructured data: Configure and publish an unstructured data extraction template.
- Extract structured data: Configure and publish a structured data extraction template.
Not sure which extraction method to use? See structured and unstructured documents to help determine what type of document(s) you will be processing.