When we think about business data, we usually picture tidy tables and dashboards neatly populated with structured relational data. But in reality, much of an organisation’s most valuable information lives in unstructured formats—scanned invoices, PDFs, handwritten notes, and contracts. This data is often locked away in silos, disconnected from the wider analytical ecosystem.
Oracle Analytics’ AI Document Understanding feature changes that. It enables organisations to automatically extract structured data from documents stored in OCI Object Storage using pretrained AI models—all without needing a data science team. With this capability, you can enrich dashboards with data that would previously be too costly or complex to access. In this post, we’ll walk through:
What Is Oracle Analytics AI Document Understanding?
At its core, the AI Document Understanding capability in Oracle Analytics leverages AI models (deployed within Oracle Cloud Infrastructure) to parse and extract fields of interest from documents stored in OCI Object Storage. This is particularly powerful for automating workflows that currently depend on manual data entry or semi-structured file formats. It supports a range of document types and layouts, including:
IAM Policies
To enable Oracle Analytics to securely access documents stored in OCI Object Storage and to invoke AI services like Document Understanding, specific IAM policies must be in place. Without these policies, your OAC instance won’t have the necessary permissions to read documents or trigger AI model processing. In this section, we’ll walk through the exact tenancy- and compartment-level policies required, ensuring your setup is both functional and secure. You can find more information here.
The following IAM policies grant Oracle Analytics the necessary permissions to read from your Object Storage bucket and to invoke the AI Document Understanding service.
Compartment level IAM Policy
Notes
Next policy needs to be defined at the root compartment level
Root level IAM Policy
These policies are necessary to enable Oracle Analytics to access the OCI AI Document Understanding model. Without these policies correctly setup, you will encounter errors when you attempt to run your data flow in Oracle Analytics.
With the IAM policies configured, you can now proceed with setting up the connection and registering the model within Oracle Analytics.
You do this by creating an Oracle Analytics connection to your Oracle Cloud Infrastructure tenancy that will enable you to gain access to your OCI Object Storage Bucket.
Register a pre-trained Document Key Value Extraction model with your Oracle Analytics instance ensuring that the bucket created previously is selected.
This completes all prerequisites and the next step is to run the newly registered pre-trained model in Oracle Analytics by creating a data flow.
The next step is to create a create a "dataset" which is used as an input to the data flow. This dataset is a CSV file that contains the OCI object storage bucket URL where the documents have been uploaded to. This CSV file can either contain a row for each document with a URL for each document that you intend to process or a single row with a URL for the bucket itself. This way every document within this bucket will be processed. Personally, it's a no brainer for me to use the second option. As mentioned earlier in this article, you need to derive the bucket URL by logging on to the OCI console's bucket details page and copying the URL from your browser. You can see a sample below that has 2 tabs; the 1st tab is what you would use for option 1 where you list out your documents with their corresponding URLs. The 2nd tab has a single row and this is what you would use to instruct the data flow to process all documents within the specified bucket. ![]()
Follow the instructions here to create your data flow.
Using the Apply AI Model step, you make a call the the registered pretrained AI Document Understanding model. You then add a Save Data step in which you specify the output dataset. In my example below, I have a few Transform Column steps which are being used to execute some transformations to some columns.
Once the data flow has been saved, it can be run to generate the output dataset. You can see a sample data visualisation workbook below based on the output dataset with some insights of the information derived from the invoices.
Tips and tricks for working with unstructured data in Oracle Analytics
Working with unstructured documents—especially at scale—introduces its own set of quirks. Here are some practical insights to help you get the most out of the AI Document Understanding feature in Oracle Analytics: Use Document Batching Strategically Oracle Analytics currently imposes a 10,000-row processing limit per run. If you’re working with high volumes:
Reuse and Schedule Data FlowsOnce you’ve built a data flow that works, save it and schedule it to run regularly:
Start Small, Then Scale Try a proof-of-concept with 10–20 documents first:
Gotchas, Limits and Tips
1. Bucket URL Must Be Copied from Browser The most confusing part of this setup is finding the correct OCI Object Storage bucket URL. It’s not visible anywhere in the console UI—you must copy it from the bucket’s detail page URL in your browser. 2. 10,000 Document Row Limit There’s a hard limit of 10,000 document rows per data flow run. If your use case involves large volumes of documents, you’ll need to split your data or automate batch runs accordingly. Note that this limit is even less when a custom model is used. The limit in this scenario is 2,000 documents. 3. Document Layouts Matter The AI model is pre-trained for certain layouts (e.g. invoices, forms). Custom layouts may yield mixed results, and you may need to experiment with field mappings to improve outcomes. 4. Use Tags for Traceability Tag your buckets and policies in OCI with labels like oac-ai-docs so they’re easier to audit and maintain. Conclusion Oracle Analytics’ AI Document Understanding feature bridges a crucial gap between unstructured documents and visual analytics. With a few setup steps—bucket creation, IAM policy configuration, model registration, and a simple data flow—you can surface hidden insights from documents that would otherwise sit untouched. It’s a powerful tool, but one with nuances—such as the hidden bucket URL and processing limits—that are worth planning for. Still, for anyone looking to extend their analytics to the edges of their data estate, this capability opens the door. Oracle Analytics now makes it possible to integrate scanned documents, invoices, and other unstructured data sources directly into your dashboards—unlocking insights that were previously out of reach.
0 Comments
Your comment will be posted after it is approved.
Leave a Reply. |
AuthorA bit about me. I am an Oracle ACE Pro, Oracle Cloud Infrastructure 2023 Enterprise Analytics Professional, Oracle Cloud Fusion Analytics Warehouse 2023 Certified Implementation Professional, Oracle Cloud Platform Enterprise Analytics 2022 Certified Professional, Oracle Cloud Platform Enterprise Analytics 2019 Certified Associate and a certified OBIEE 11g implementation specialist. Archives
February 2025
Categories
All
|