Invoices and AI’s Intelligent Document Processing 

In Brief

  • AI (Artificial Intelligence) can allow companies to dynamically gather data through IDP 
  • IDP does have limitations in how it analyzes and extracts data. 
  • AI has tools to incorporate OCR into IDP to expand data analysis. 

As AI is still being hashed out for its maximum potential, there are aspects of this newly founded technology that can benefit a company in its everyday processes or tasks. 

AI models and programs are still being developed. Some companies are already using AI for intelligent Document Processing (IDP). But what does a company do when their data is poor? Normally, a company can use Optical Character Recognition (OCR) technology to take document images and convert them to digitized text. But this process is isolated, only being able to analysis one document at a time. But with the assistance of AI tools, OCR data can be captured and used for IDP, making it more consumable for the company.  

IDP Example 

IDP can occur by extracting information from various areas within an organization. This could be from invoices, text found in photos and PDFs, receipts, and information from documents.  

Many of these items are used in applications that are limited in their ability to process data as well. Adobe is an application that hosts pdfs but does not engage in any data processing activities. Microsoft, with its Power Automate, can process and load data into tables and models but needs access from IT departments. 

But when setup, Power Automate can take good and bad invoice documents and perform IDP and extract invoice data correctly. And even after completing the task, it can offer a confidence score as this is AI making and attempt to pull the correct fields and information for how the user requests it. The higher the confidence score the more reliable the IDP process. 

This is not just a guess for the AI model, when creating the model, you can set the fields and pick out where that field should be within the document. For the IDP aspect of power automate, it will attempt to match what the user set as training material to what it is given for a test and try its best to give the correct output.  

Challenges with IDP 

IDP is capable at some levels, yet even with a confidence score, there will be times when finding and connecting data is too difficult. Too much volume (data size). Too much velocity (rate of change). Too much variety (breadth of sources). Experiences like this for data connectivity are too fragmented and data often needs to be reshaped before consumption. But any shaping is one-off and not repeatable. 

So, there are limitations to IDP as it may never be 100% confident. Especially when the source document is custom and complex due to its readability or over presentation of information. For other documents, it may make sense for time to be spent cleaning up documents for it to become easier for AI programs to read them. Correcting documents is not a one size fits all type of change as some documents, especially scanned in documents may have portions where the data quality is poor, which makes it difficult for IDP to occur.  

It would take enormous amounts of effort and non-value add time to convert files and sometimes boxes of invoices into consumable data for detailed analysis. One goal might be getting all those invoices and extracting the data into a table.  

AI to use OCR to convert PDF 

There are AI tools available that can convert these files into data that is more consumable, but it is important to evaluate the tool in comparison to the specific technology requirements at your organization. Here is a simple list of areas to review when deciding on an OCR conversion tool. 

  • Does it connect to other tax processes? 
  • Will my IT department allow access? 
  • Can it extract data from a table on the PDF/invoice/agreement? 
  • Can I use it as a model to train for other types of documents? 
  • Are there any firewall/security issues with the data being extracted? 
  • Easily drag/drop multiple files to have mass data extraction? 
  • Processing time? 
  • Cost structure? 

One of the most notable features of using AI to extract data is if the model can be trained: learning about the specific types of invoices (vendors, locations, etc.) a company might produce. Yet, it is important that whatever model or tool is selected, it allows the tax department to easily train the model to pick the data off the document.  

IRS CircIRS Circular 230 Required Notice‐‐IRS regulations require that we inform you that to the extent this communication contains any statement regarding federal taxes, that statement was not written or intended to be used, and it cannot be used, by any person (i) for the purpose of avoiding federal tax penalties that may be imposed on that person, or (ii) to promote, market or recommend to another party any transaction or matter addressed herein.