ELFFAR ANALYTICS
  • Home
  • Blog

Elffar Analytics Blog

by Joel Acha

OCI Document Understanding transforms unstructured data into actionable insights

16/11/2024

0 Comments

 
Picture
In today’s data-driven world, the ability to transform unstructured data into actionable insights is critical for organisations. With the November 2024 release of Oracle Analytics Cloud (OAC), the integration of OCI Document Understanding that brings cutting-edge capabilities to businesses looking to unlock the value hidden in their documents has been extended to allow users to register custom models. You can find out more information on how to create a custom model here.

In this blog, we will be looking at document understanding and how it fits into analytics. Here’s how this feature empowers analytics workflows.

From Unstructured to Actionable: The Role of Text Extraction

Many essential business processes rely on unstructured documents such as contracts, invoices, shipping manifests, and feedback forms. These documents often contain vital data, but their formats - PDFs, scanned images, or handwritten forms - make extracting and analysing this data manually a time-consuming and error-prone process.
Picture
Key Benefits of Text Extraction for Analytics​

​Text extraction is often viewed as a preliminary step rather than an intrinsic part of analytics, but this perspective underestimates its transformative impact on modern data workflows. In today’s organisations, vast amounts of critical information remain trapped in unstructured formats - documents, emails, contracts, and scanned images. Without the ability to extract and structure this data, analytics initiatives risk missing out on valuable insights hidden in plain sight like having a collection of . By integrating text extraction directly into analytics workflows, businesses not only bridge the gap between unstructured and structured data but also enhance the scope and accuracy of their insights.

While it may seem that text extraction belongs solely to the domain of data preparation, its seamless integration into analytics platforms changes the game. By enabling users to work directly with previously inaccessible information, text extraction ensures that analytics becomes truly comprehensive. This convergence eliminates the need for siloed processes, accelerates decision-making, and empowers users to leverage their data assets fully. As the lines between data preparation and analytics blur, text extraction proves itself not as a separate utility but as an essential enabler of meaningful, end-to-end analytics workflows.

Some of the benefits of integrating text extraction with analytics are:

1. Streamlined Data Preparation

Extracted text is ready for analysis without requiring extensive manual intervention. For example, a retail company can process thousands of supplier invoices, extracting line-item details such as product names, prices, and quantities. This structured data feeds into Oracle Analytics for further preparation and enrichment, such as cleansing inconsistent naming conventions or enriching data with external sources.

2. Improved Decision-Making

By leveraging the extracted text, users can create dashboards that provide actionable insights. A logistics company, for example, might track delivery times and costs across suppliers, identifying inefficiencies and opportunities to renegotiate contracts.

3. Cross-Document Analysis

OCI Document Understanding enables businesses to analyse trends across a corpus of documents. A financial institution can aggregate key metrics from thousands of contracts, such as interest rates or repayment terms, to assess portfolio risk and optimise lending strategies.

4. Advanced Search and Contextual Insights

Once text is extracted, it can be indexed and searched, enabling users to locate specific terms or patterns across document sets. For instance, legal teams can identify clauses that might expose the organisation to risk, while sales teams can quickly review terms in customer contracts to tailor offers.
Picture
Registering a Pre-trained Document Key Value Extraction Model in Oracle Analytics Cloud

Oracle Analytics Cloud provides access to some pre trained OCI document understanding models. This process allows you to leverage the AI capabilities of OCI Document Understanding within OAC to automatically extract key data points from your documents. Here are the detailed steps involved:

Access the Model Registration Function: Begin by navigating to the OAC Home Page. In the top right corner, locate the three-dot menu (ellipsis) and select "Register Model/Function." From the options presented, choose "OCI Document Understanding Models"

Establish the OCI Connection: Next, you'll need to select your OCI connection. If you haven't already established a connection between OAC and OCI, you'll be prompted to create one. This connection is crucial as it enables OAC to interact with the OCI Document Understanding service. 

Select the Desired Model Type: Once the OCI connection is established, a "Select a Model" window will appear. Choose "Pretrained Document Key Value Extraction" as the model type. This specific model is designed to identify and extract key data from documents, such as merchant names, addresses, and total prices.

Specify the OCI Bucket and Document Type: In the right-side panel of the "Select a Model" window, you'll need to provide two crucial pieces of information:
  1. OCI Bucket: Select the OCI bucket where you have stored the document images you want to analyze. Ensure this bucket is located within the same tenancy as your OAC instance2.○
  2. Document Type: Choose the specific document type that corresponds to the images in your bucket. For instance, if you are analysing receipts, select "Receipt." This helps the model accurately identify and extract the relevant key values for that particular document type.

Provide a Model Name and Register: Finally, give your model a descriptive name for easy identification within OAC. Click "Register" to complete the process. You can view your registered model under the "Models" tab in the Machine Learning page of OAC. By following these steps, you successfully register a pre-trained document key value extraction model in OAC, setting the stage for streamlined data preparation and enhanced data analysis. You can then create data flows within OAC to apply this registered model to your documents, extract the desired key values, and use this structured data to generate valuable insights. 
​You may also create your own custom model and register it for use in Oracle Analytics Cloud as well if the pre trained models are not fit for your specific use case and this is the new aspect of this feature that has been added in the November 2024 update. 

Summary


In conclusion, the integration of text extraction capabilities within analytics workflows represents a pivotal advancement for organisations striving to unlock the full potential of their data. By transforming unstructured content into actionable insights, tools like OCI Document Understanding within Oracle Analytics Cloud bridge the gap between data preparation and analysis, enabling faster, more accurate decision-making. While debates may persist about whether text extraction is a standalone process or part of analytics, its value in delivering comprehensive, data-driven outcomes is undeniable. As businesses continue to navigate an increasingly data-rich landscape, embracing these capabilities will be key to maintaining a competitive edge.
0 Comments

    Author

    A bit about me. I am an Oracle ACE Pro, Oracle Cloud Infrastructure 2023 Enterprise Analytics Professional, Oracle Cloud Fusion Analytics Warehouse 2023 Certified Implementation Professional, Oracle Cloud Platform Enterprise Analytics 2022 Certified Professional, Oracle Cloud Platform Enterprise Analytics 2019 Certified Associate and a certified OBIEE 11g implementation specialist.

    Archives

    February 2025
    January 2025
    December 2024
    November 2024
    September 2024
    July 2024
    May 2024
    April 2024
    March 2024
    January 2024
    December 2023
    November 2023
    September 2023
    August 2023
    July 2023
    September 2022
    December 2020
    November 2020
    July 2020
    May 2020
    March 2020
    February 2020
    December 2019
    August 2019
    June 2019
    February 2019
    January 2019
    December 2018
    August 2018
    May 2018
    December 2017
    November 2016
    December 2015
    November 2015
    October 2015

    Categories

    All
    AI
    OAC
    OAS
    OBIEE
    OBIEE 12c

    RSS Feed

    View my profile on LinkedIn
Powered by Create your own unique website with customizable templates.
  • Home
  • Blog