Document processing is the process required to convert forms, documents, and pictures into usable pieces of information that will be consumed by a business to provide value to their customers. Today, millions of people around the world are responsible for manually reading documents to pull information out of them so that the business process can continue. There have been ever changing challenges and improvements happening to this process, which we'll explore further and discuss how we can solve them.
Document processing is required every day by most businesses. It is often seen in Accounts Payable teams while they are processing Invoices, and Purchase Orders to help the business function. Other use cases include Receipt, ID processing, and anything else that requires looking at a PDF or an image.
There are many different approaches to document processing. Some businesses opt for a more manual or time-consuming approach, while others use more automation and software. It depends on the specifics of the business, its needs, and the available technology. Some common approaches include:
This is a process, where for every single business process driven by a document to go through, a human being is required to make progress. Sometimes this might be the only way to economically perform document processing. Cases where completely manually make sense today include:
This is a process by which human beings often use technology to make their time spent processing documents go quicker. Often times common tools such as PDF Readers, OCR technology, various ERP tools, etc are used to help them be more productive with extracting information from documents for processing. This is usually a marginal improvement over being completely manual and requires a similar amount of time and cost.
This is a process by which further automation takes place. Usually to perform this, business processes are codified further to involve a few predictable paths, and one of the following technologies is used:
OCR Technology is largely grouped under a few different buckets today, each with its own set of challenges which pose a barrier to adoption or achieving the end result you want.
OCR, or Optical Character Recognition, is responsible for taking an image and extracting raw pieces of text from it. It is commonly used on scanned documents to allow the PDFs to be copyable. Oftentimes, raw OCR is used only as a tool to help human reviewers save a bit of time, and high levels of automation are difficult to achieve. However, if the business process requires a relatively small amount of structured information extracted that can be identified by some common patterns, (for example, always look for a text starting with INV-XXX) raw OCR can be quite effective.
Options for Raw OCR, include Cloud Provider's OCR APIs (Google, Amazon, Microsoft). Various PDF software that has OCR built into it (some printer/scanner software, Adobe, many others). Open source tooling like Tesseract can also work. Although, from our research we've found Google to hold the edge right now with how accurate the OCR results are under various conditions.
Pros:
Robotic Process Automation (RPA) is a technology that automates business processes using robots. RPA can be used to automate tasks such as data entry, customer service, and document processing. RPA effectively uses a script running on a windows machine to automate clicks that a human being would be doing to perform a certain task. If a business process can be accurately broken down into a specific list of clicks they would do, RPA can be an effective approach. Though challenges here will deal with maintaining RPA bots, as they tend to be fragile over time. You'll need to invest in a team capable of maintaining this service.
Options for RPA include players like UIPath, Automation Anywhere, and Microsoft. Challenges will invariably include an upfront cost and time. Making RPA bots requires time and knowledge to create them well. Creating them well, often requires a lot of coordination between IT, Business Users, and Leaders to agree on a new process. In addition, it is often impractical to actually embed RPA processes natively in your applications due to response times, and fragility of these bots.
RPA players also often incorporate some form of OCR into their services, and they also allow users to draw boxes on areas of the PDF or image to extract. This is often better than raw OCR, but it still runs into challenges where minor changes to the document breaks the bot completely.
Users attempting all of the above realize that they can try to solve the problems they encounter often. Usually this has to do with fixing issues with Raw OCR and integrating natively in their application via APIs to enable full and fast automation. For example, this is how TurboTax is able to scan a W-2 and auto enter information in their software natively - without waiting for a person to review it or waiting for an RPA bot to complete. The challenges with building this yourself is in trying to improve on Raw OCR results, but trying to build intelligence on extracting just the information you want - regardless of how and where it appears.
This is where AI Driven OCR APIs are useful. It's usually newer players and startups offering this service as it's based on recent state of the art improvements to AI that incorporate both visual information as well as textual information. The issue is, often these models are difficult to train and tweak. This is where services like Butler can help. Butler helps any developer come in and customize highly accurate models using state of the art AI in under 10 minutes. Give it a try!
Alternatives to using Butler include building your own document processing AI model using a combination of code, NLP models, and CV models. We'll go into depth in a later blog post on how we can do that.
We hope this gives you a good overview of what Document Processing is, challenges with automating it, along with a few solutions you can try to get better. Our mission at Butler is to enable any developer to customize and embed AI into their applications. We crave to see the world where all applications we interact with every day are intelligent. We hope you help us make it happen!