ELFFAR ANALYTICS
  • Home
  • Blog

Elffar Analytics Blog

by Joel Acha

Insights from Unstructured Data: Oracle Analytics and AI Document Understanding

8/6/2025

0 Comments

 
Picture
When we think about business data, we usually picture tidy tables and dashboards neatly populated with structured relational data. But in reality, much of an organisation’s most valuable information lives in unstructured formats—scanned invoices, PDFs, handwritten notes, and contracts. This data is often locked away in silos, disconnected from the wider analytical ecosystem.

Oracle Analytics’ AI Document Understanding feature changes that. It enables organisations to automatically extract structured data from documents stored in OCI Object Storage using pretrained AI models—all without needing a data science team. With this capability, you can enrich dashboards with data that would previously be too costly or complex to access.

In this post, we’ll walk through:

  • What the AI Document Understanding feature is
  • Practical business scenarios where it is applicable
  • Step-by-step setup and configuration, including OCI policy management and model registration
  • Tips and tricks for working with unstructured data in Oracle Analytics
What Is Oracle Analytics AI Document Understanding?

At its core, the AI Document Understanding capability in Oracle Analytics leverages AI models (deployed within Oracle Cloud Infrastructure) to parse and extract fields of interest from documents stored in OCI Object Storage. This is particularly powerful for automating workflows that currently depend on manual data entry or semi-structured file formats.

It supports a range of document types and layouts, including:

  • PDF invoices from suppliers
  • Utility bills or receipts
  • Financial statements
  • Handwritten or printed forms (subject to OCR accuracy)
The output is a structured dataset—columnar data ready for blending, analysis, and visualisation in Oracle Analytics.
Picture
IAM Policies
​
To enable Oracle Analytics to securely access documents stored in OCI Object Storage and to invoke AI services like Document Understanding, specific IAM policies must be in place. Without these policies, your OAC instance won’t have the necessary permissions to read documents or trigger AI model processing. In this section, we’ll walk through the exact tenancy- and compartment-level policies required, ensuring your setup is both functional and secure. You can find more information here.
The following IAM policies grant Oracle Analytics the necessary permissions to read from your Object Storage bucket and to invoke the AI Document Understanding service.
Compartment level IAM Policy

Notes

  1. Replace <GROUP NAME> with your OCI IAM Group name
  2. Replace <COMPARTMENT NAME> with your OCI Compartment name where you Object Storage bucket is located in your OCI tenancy
  3. Replace <BUCKET NAME> with your Object Storage Bucket name where your files have been uploaded to. Note that the Bucket Name is in single quotes.​
​
​Next policy needs to be defined at the root compartment level 
Root level IAM Policy

These policies are necessary to enable Oracle Analytics to access the OCI AI Document Understanding model. Without these policies correctly setup, you will encounter errors when you attempt to run your data flow in Oracle Analytics. 
Picture
With the IAM policies configured, you can now proceed with setting up the connection and registering the model within Oracle Analytics.

You do this by creating an Oracle Analytics connection to your Oracle Cloud Infrastructure tenancy that will enable you to gain access to your OCI Object Storage Bucket.
Register a pre-trained Document Key Value Extraction model with your Oracle Analytics instance ensuring that the bucket created previously is  selected.
Picture
This completes all prerequisites and the next step is to run the newly registered pre-trained model in Oracle Analytics by creating a data flow.

The next step is to create a create a "dataset" which is used as an input to the data flow. This dataset is a CSV file that contains the OCI object storage bucket URL where the documents have been uploaded to. 

This CSV file can either contain a row for each document with a URL for each document that you intend to process or a single row with a URL for the bucket itself. This way every document within this bucket will be processed. Personally, it's a no brainer for me to use the second option.  As mentioned earlier in this article, you need to derive the bucket URL by logging on to the OCI console's bucket details page and copying the URL from your browser. You can see a sample below that has 2 tabs; the 1st tab is what you would use for option 1 where you list out your documents with their corresponding URLs. The 2nd tab has a single row and this is what you would use to instruct the data flow to process all documents within the specified bucket.
Template Bucket URL dataset
File Size: 10 kb
File Type: xlsx
Download File

Picture
Picture
Follow the instructions here to create your data flow.
Picture
Using the Apply AI Model step, you make a call the the registered pretrained AI Document Understanding model. You then add a Save Data step in which you specify the output dataset. In my example below, I have a few Transform Column steps which are being used to execute some transformations to some columns.
Picture
Once the data flow has been saved, it can be run to generate the output dataset. You can see a sample data visualisation workbook below based on the output dataset with some insights of the information derived from the invoices.
Picture
Tips and tricks for working with unstructured data in Oracle Analytics

Working with unstructured documents—especially at scale—introduces its own set of quirks. Here are some practical insights to help you get the most out of the AI Document Understanding feature in Oracle Analytics:

Use Document Batching Strategically
Oracle Analytics currently imposes a 10,000-row processing limit per run. If you’re working with high volumes:

  • Break documents into batches and use multiple data flows.
  • Consider filtering by date or document type.​
 
​Reuse and Schedule Data Flows
Once you’ve built a data flow that works, save it and schedule it to run regularly:

  • Use the built-in scheduler in Oracle Analytics.

​Start Small, Then Scale
Try a proof-of-concept with 10–20 documents first:

  • Test how accurate the model is.
  • Check how the data appears in the dataset.
  • Adjust your approach based on results—especially if OCR noise is high.
Resources

​Official Documentation
Some YouTube videos from the Oracle Analytics channel:
Gotchas, Limits and Tips

1. Bucket URL Must Be Copied from Browser
The most confusing part of this setup is finding the correct OCI Object Storage bucket URL. It’s not visible anywhere in the console UI—you must copy it from the bucket’s detail page URL in your browser.
2. 10,000 Document Row Limit
There’s a hard limit of 10,000 document rows per data flow run. If your use case involves large volumes of documents, you’ll need to split your data or automate batch runs accordingly. Note that this limit is even less when a custom model is used. The limit in this scenario is 2,000 documents. 
3. Document Layouts Matter
The AI model is pre-trained for certain layouts (e.g. invoices, forms). Custom layouts may yield mixed results, and you may need to experiment with field mappings to improve outcomes.
4. Use Tags for Traceability
Tag your buckets and policies in OCI with labels like oac-ai-docs so they’re easier to audit and maintain.

Conclusion

Oracle Analytics’ AI Document Understanding feature bridges a crucial gap between unstructured documents and visual analytics. With a few setup steps—bucket creation, IAM policy configuration, model registration, and a simple data flow—you can surface hidden insights from documents that would otherwise sit untouched.

It’s a powerful tool, but one with nuances—such as the hidden bucket URL and processing limits—that are worth planning for. Still, for anyone looking to extend their analytics to the edges of their data estate, this capability opens the door. Oracle Analytics now makes it possible to integrate scanned documents, invoices, and other unstructured data sources directly into your dashboards—unlocking insights that were previously out of reach.​
0 Comments

Your comment will be posted after it is approved.


Leave a Reply.

    Author

    A bit about me. I am an Oracle ACE Pro, Oracle Cloud Infrastructure 2023 Enterprise Analytics Professional, Oracle Cloud Fusion Analytics Warehouse 2023 Certified Implementation Professional, Oracle Cloud Platform Enterprise Analytics 2022 Certified Professional, Oracle Cloud Platform Enterprise Analytics 2019 Certified Associate and a certified OBIEE 11g implementation specialist.

    Archives

    February 2025
    January 2025
    December 2024
    November 2024
    September 2024
    July 2024
    May 2024
    April 2024
    March 2024
    January 2024
    December 2023
    November 2023
    September 2023
    August 2023
    July 2023
    September 2022
    December 2020
    November 2020
    July 2020
    May 2020
    March 2020
    February 2020
    December 2019
    August 2019
    June 2019
    February 2019
    January 2019
    December 2018
    August 2018
    May 2018
    December 2017
    November 2016
    December 2015
    November 2015
    October 2015

    Categories

    All
    AI
    OAC
    OAS
    OBIEE
    OBIEE 12c

    RSS Feed

    View my profile on LinkedIn
Powered by Create your own unique website with customizable templates.
  • Home
  • Blog