Ultimate Guide to AI-enabled Document Processing OCR: Increase Speed, Accuracy, and Efficiency

Posted by:
Jay Madheswaran
on
June 29, 2022

What is Document Processing?

Document processing is the process required to convert forms, documents, and pictures into usable pieces of information that will be consumed by a business to provide value to their customers. Today, millions of people around the world are responsible for manually reading documents to pull information out of them so that the business process can continue. There have been ever changing challenges and improvements happening to this process, which we'll explore further and discuss how we can solve them.

What are the Common Use Cases of Document Processing?

Document processing is required every day by most businesses. It is often seen in Accounts Payable teams while they are processing Invoices, and Purchase Orders to help the business function. Other use cases include Receipt, ID processing, and anything else that requires looking at a PDF or an image. 

 

What are the Common Approaches to Document Processing?

There are many different approaches to document processing. Some businesses opt for a more manual or time-consuming approach, while others use more automation and software. It depends on the specifics of the business, its needs, and the available technology. Some common approaches include:

  • Completely manual
  • Manual Assisted with Technology
  • Automated with offline human review or auditing

Completely Manual

This is a process, where for every single business process driven by a document to go through, a human being is required to make progress. Sometimes this might be the only way to economically perform document processing. Cases where completely manually make sense today include:

  • Significant cost associated with getting information incorrect that it's worth it to spend more time and money getting it right
  • The processes themselves are nebulous and ever changing that it is difficult to proceduralize and automate
  • Upfront financial cost or time investment required to automate is too high

 

Manual Assisted with Technology

This is a process by which human beings often use technology to make their time spent processing documents go quicker. Often times common tools such as PDF Readers, OCR technology, various ERP tools, etc are used to help them be more productive with extracting information from documents for processing. This is usually a marginal improvement over being completely manual and requires a similar amount of time and cost.

 

Automated with Offline Human Review Or Auditing

This is a process by which further automation takes place. Usually to perform this, business processes are codified further to involve a few predictable paths, and one of the following technologies is used:

  • Raw OCR
  • RPA with Template based OCR
  • AI Driven OCR solutions

Challenges with Automating with OCR

OCR Technology is largely grouped under a few different buckets today, each with its own set of challenges which pose a barrier to adoption or achieving the end result you want.

 

Raw OCR

OCR, or Optical Character Recognition, is responsible for taking an image and extracting raw pieces of text from it. It is commonly used on scanned documents to allow the PDFs to be copyable. Oftentimes, raw OCR is used only as a tool to help human reviewers save a bit of time, and high levels of automation are difficult to achieve. However, if the business process requires a relatively small amount of structured information extracted that can be identified by some common patterns, (for example, always look for a text starting with INV-XXX) raw OCR can be quite effective.

Options for Raw OCR, include Cloud Provider's OCR APIs (Google, Amazon, Microsoft). Various PDF software that has OCR built into it (some printer/scanner software, Adobe, many others). Open source tooling like Tesseract can also work. Although, from our research we've found Google to hold the edge right now with how accurate the OCR results are under various conditions.

Pros:

  • Low Cost
  • Multiple ways of bringing this into your business process without many changes
Cons:
  • Achieves lower levels of accuracy
  • Tends to be fragile to maintain
  • Trying to automate fully, often requires a level of maintenance from engineering

 

RPA and Template based OCR 

Robotic Process Automation (RPA) is a technology that automates business processes using robots. RPA can be used to automate tasks such as data entry, customer service, and document processing. RPA effectively uses a script running on a windows machine to automate clicks that a human being would be doing to perform a certain task. If a business process can be accurately broken down into a specific list of clicks they would do, RPA can be an effective approach. Though challenges here will deal with maintaining RPA bots, as they tend to be fragile over time. You'll need to invest in a team capable of maintaining this service.

 

Options for RPA include players like UIPath, Automation Anywhere, and Microsoft. Challenges will invariably include an upfront cost and time. Making RPA bots requires time and knowledge to create them well. Creating them well, often requires a lot of coordination between IT, Business Users, and Leaders to agree on a new process. In addition, it is often impractical to actually embed RPA processes natively in your applications due to response times, and fragility of these bots.

 

RPA players also often incorporate some form of OCR into their services, and they also allow users to draw boxes on areas of the PDF or image to extract. This is often better than raw OCR, but it still runs into challenges where minor changes to the document breaks the bot completely.

 

Pros:
  • Has a path for fully automating a business process
  • Allows to automate applications that have no API access
Cons:
  • Can get expensive as you pay per bot. Given the pricing structure used by most RPA vendors, it may not make sense for a large amount of use cases
  • While it is "easy" to build a bot, in practice it will take quite a bit of effort to build and maintain a bot that doesn't crash. Sometimes, this effort isn't worth automating the process depending
  • Bots break down often. Given that bots work by moving the mouse and clicking on buttons, they're impacted by software updates, and other things you nor the RPA vendor can control, leading to things breaking down often. Which in turn puts unexpected pressure on contingencies you had in place - which can lead to expensive and undesirable business outcomes.
  • The fragility of RPA bots are then made even more fragile by the OCR technology not being fully AI based, leading to even further breakages.

 

AI Driven OCR Solutions aiming to do Intelligent Document Processing

Users attempting all of the above realize that they can try to solve the problems they encounter often. Usually this has to do with fixing issues with Raw OCR and integrating natively in their application via APIs to enable full and fast automation. For example, this is how TurboTax is able to scan a W-2 and auto enter information in their software natively - without waiting for a person to review it or waiting for an RPA bot to complete. The challenges with building this yourself is in trying to improve on Raw OCR results, but trying to build intelligence on extracting just the information you want - regardless of how and where it appears.

 

This is where AI Driven OCR APIs are useful. It's usually newer players and startups offering this service as it's based on recent state of the art improvements to AI that incorporate both visual information as well as textual information. The issue is, often these models are difficult to train and tweak. This is where services like Butler can help. Butler helps any developer come in and customize highly accurate models using state of the art AI in under 10 minutes. Give it a try!

Alternatives to using Butler include building your own document processing AI model using a combination of code, NLP models, and CV models. We'll go into depth in a later blog post on how we can do that.

 

Pros
  • Low Cost
  • Typically achieves a much higher level of accuracy than all of the solutions above
Cons
  • Achieving full automation requires embedding AI APIs into your application
  • Often requires restructuring your document processing pipeline to include a fully automated path.

 

Summary 

We hope this gives you a good overview of what Document Processing is, challenges with automating it, along with a few solutions you can try to get better. Our mission at Butler is to enable any developer to customize and embed AI into their applications. We crave to see the world where all applications we interact with every day are intelligent. We hope you help us make it happen!

Build document extraction into your product or workflow today!