Skip to main content
Version: 8.9 (unreleased)

Amazon Textract connector

Integrate Amazon Textract to automatically extract document text and data in your BPMN service.

About this connector

Use this connector to orchestrate Amazon Textract-powered extraction as part of business processes that rely on documents. Using machine learning allows you to read and process any type of document, reducing manual work and increasing accuracy in document-centric processes.

The Amazon Textract machine learning (ML) service can automatically extract text, handwriting, layout elements, and data from scanned documents.

Camunda marketplace

The Amazon Textract connector is available in the Camunda marketplace.

Prerequisites

The following prerequisites are required to use this connector:

PrerequisiteDescription
Amazon Web Services (AWS) IAM user and permissions
  • A valid AWS Identity and Access Management (IAM) user with permissions configured to allow access to Amazon Textract (and Amazon S3 if used), such as:

    • AmazonTextractFullAccess: Required

    • AmazonS3ReadOnlyAccess: Required if using Amazon S3 as the document source

    • AmazonS3FullAccess: Optional if using Amazon S3 as the output location for asynchronous execution

  • The access key pair (access key and secret access key) for this IAM user. This is required for connector authentication.

info

For Amazon Textract setup instructions, refer to the Amazon Textract Developer Guide.

Use this connector

New to using an outbound connector? Learn how to add and use this type of connector, apply element templates, use connector secrets, handle results and errors, and more.

Authentication

Select an authentication type from the Authentication dropdown.

Credentials

Use AWS authentication.

PropertyTypeRequiredDescriptionExample
Access KeyStringYesAWS access key for Textract.AKIAIOSFODNN37
Secret KeyStringYesAWS secret key for Textract.wJalrXUtnFEgfMIK7MDENGbPxRfiCY
note

Requires your AWS access key and secret access key (see prerequisites).

Default Credentials Chain (hybrid/Self-Managed only)

Use this authentication type if your system relies on implicit authentication (for example, IAM roles, environment variables, or credentials files). Uses the Default Credential Provider Chain to resolve credentials.

Configuration

Region

Configure the AWS region for this connector.

PropertyTypeRequiredDescriptionExample
RegionStringYesSpecify the AWS region where the Textract service and your S3 buckets are hosted.us-east-1

Operations

Analyze Document

Analyze documents using Textract. Different input parameters are available depending on the Execution type you select.

Input parameters

PropertyTypeRequiredDescriptionExample
Execution typeDropdownYes

Specify the inference endpoint type:

  • Real-time: For small files requiring immediate text extraction. Only single-page PDFs are supported when using S3. For multi-page PDFs, use Polling or Asynchronous.

  • Polling: Starts analysis and polls every five seconds until the result is available. Best for larger documents where blocking execution is acceptable.

  • Asynchronous: For large or complex documents processed in the background.

document
Document locationDropdownYesWhere the document that should be analyzed is stored. S3 is best for most use-casesS3
Document bucketStringYes for S3 sourceName of the S3 bucket containing the document. Ensure proper permissions for Textract access.automation-test
Document nameStringYes for S3 sourceFull path from the bucket root to the document.my-document.pdf
Document versionStringNoSpecify if you need to process a specific document version. If not set, the latest version is used.5
Camunda documentStringYes for Camunda sourceSelect the document from the Camunda document store. Only PNG and JPEG formats are supported. Real-time execution only.document
Output S3 BucketStringYes for AsynchronousOutput S3 Bucketautomation-output

You must select at least one feature type. Combining multiple options can produce richer extraction results.

PropertyTypeRequiredDescriptionExample
Analyze formBooleanNoSelect this to return information detected form data.
Analyze signaturesBooleanNoSelect this to return the locations of detected signatures.
Analyze layoutBooleanNoSelect this to return information about the layout of the document.
Analyze queriesBooleanNoSelect this to return an answer to a query.
QueryStringYes, if analyze queries is trueThe query to be applied to the document.What is the IBAN in the invoice?

Additional optional parameters for advanced configuration:

PropertyTypeRequiredDescriptionExample
Client Request TokenStringNoThe idempotent token that you use to identify the start request.
Job TagStringNoAn identifier that you specify that's included in the completion notification published to the Amazon SNS topic.
KMS Key IDStringNoThe KMS key used to encrypt the inference results.
Notification Channel Role ARNStringNoThe Amazon SNS topic role ARN that you want Amazon Textract to publish the completion status of the operation to.
Notification Channel SNS Topic ARNStringNoThe Amazon SNS topic ARN that you want Amazon Textract to publish the completion status of the operation to.

Output

The connector response mirrors the AWS Textract API, depending on the execution type:

To get the answer of the query when using the Analyze queries feature:

= {"answer": response.blocks[item.blockType = "QUERY_RESULT"][1].text}

For example, to get the response, when using asynchronous execution, use a timer event for example and retrieve the result with the S3 connector.

Example process using asynchronous execution

Troubleshooting

To learn about general error handling in Camunda, see BPMN errors and failing jobs.

Further Resources