Machine Learning for Data Extraction and Classification
We are ramping up our team for the Microsoft Inspire Conference (Booth 1237) in Washington, DC in a few weeks (July 9-13), and I thought I would put together some ideas on the power of Ephesoft technology when combined with Microsoft technologies. We have been working with several Microsoft Teams (Azure, SharePoint, Flow) to bring solutions to market, and provide extensive document-centric solutions to their partner and customer ecosystem. So how do we fit? I will outline a quick primer.
Ephesoft was founded in 2010 by leaders from the document capture industry that wanted to drive innovation and disrupt the legacy document automation space. The company has shown explosive growth through its unique perspective on taming unstructured content using patented complex analytics and machine learning. Its technology has garnered broad interest, and investment from top-tier firms like Fujitsu and In-Q-Tel.
At the heart of Ephesoft Technology is an engine that provides automated document classification and data extraction. Feed it documents from any source (fax, scanners, copiers, folders, legacy ECM systems, mobile devices, repositories) and it will do all the heavy lifting – sorting, separating, classifying and getting you the data you need to drive efficiency, productivity, automation and decision-making with minimal end-user intervention. Providing SaaS and PaaS solutions, and available on premise or in the cloud, the Ephesoft platform can provide great value to any size organization. Ephesoft has two products:
Ephesoft Transact – a transaction document capture platform for day-to-day document processing.
Ephesoft Insight – a document analytics platform for ingesting large volumes of existing unstructured content and extracting meaning.
Think of Ephesoft as an added intelligent document automation layer that can be placed on top of other technologies as a catalyst for automation. Below is a list of core technologies from Microsoft, and how Ephesoft can fit from a business perspective.
With SharePoint, Ephesoft Transact can be an intelligent on ramp for documents into SharePoint libraries. As a front end loader, Transact can auto-identify and route documents from just about any source, and make sure they wind up in the right library, as a searchable PDF, with all the important metadata extracted. It provides a standardized, repeatable process for adding any type of document to Microsoft SharePoint.
With Ephesoft Insight, SharePoint libraries can now be consumed and leveraged for Document Analytics. Insight provides the “document side” of the analytics equation.
You can get more information here:
Utilizing Ephesoft Web Services in the cloud, you can add intelligence to any Microsoft Flow workflow. Using the classification or extraction services, you can use Ephesoft Transact technology to “open up” documents mid-process, and make workflow branching decisions based on what you find. An example of a Flow use case here:
ERP and Accounting systems can leverage the power of Ephesoft in many different ways. As a processing engine, Ephesoft Transact can extract information from critical documents, like invoices or sales orders, and pass the information on to Dynamics. No longer will employees have to hand key information, and waste precious time. Along with time savings, data entry errors can now be eliminated through Ephesoft Transact’s validation and exception processing capabilities. More info:
Document capture and automation is a great fit for the cloud. Ephesoft’s web-based technology and RESTful APIs are cloud ready, and are available in Microsoft Azure. As a Cloud Infrastructure partner, Ephesoft has worked diligently to insure compatibility with Azure, and also to take advantage of all the cloud has to offer from a scalability and availability perspective. Read more on Ephesoft’s cloud platform:
This is just a short list of possibilities. Ephesoft’s products are built for partners, and have an open architecture to facilitate the building of portable solutions to add value and drive revenue. Come see us at Inspire (Booth 1237), or reach out to us directly for more information: Contact Us.
I have been working with several of our MFP/Copier partners, and wanted to put together a video demo on how to use copiers to train Ephesoft when it comes to our machine learning engine. This demo shows how you can use our document analytics engine and train HR documents.
One of our regional reps produced this video to help show how we differ from other document capture and analytics platforms on the market. This is a great expansion to one of my earlier posts – Analytics and Document Capture – Why it Matters The video gives a great overview on the many dimensions of a document, and how Ephesoft leverages its patented technology to enhance accuracy, analyze large volumes of documentation, and process unstructured information.
In the world of document capture and analytics, our typical value proposition is around efficiency, reduction in required headcount and the reduction in turnaround time. Of course, there is true value and cost savings for any organization processing a significant volume of documents if you focus on these value points. Lately, we have been having some great conversations both internally and externally on the true cost of errors in data entry, and I wanted to dig deep into my past, and present a key topic for discussion.
Back in my Navy days, I found myself in the center of a focus on quality, and we had morphed Deming’s Total Quality Management (TQM) into a flavor that would serve us well. In a nutshell, it was an effort to increase quality through a systematic analysis of process, and a continuous improvement cycle. It focused on reducing “Defects” in process, with the ultimate goal of eliminating them all together. Defects impose a high cost on the organization, and can lead to failures across the board. Today, all these concepts can be applied to the processing of documents and their associated data. What is the true value of preventing defects in data?
In my education in this topic, I remember a core concept on quality, and defects: the 1-10-100 rule.
The rule gives us a graphic representation of the escalating cost of errors (or failures), with prevention costing a $1, correction $10 and failure $100. So, in terms of data:
So, an ounce of prevention is worth a pound of cure. In this case, the lack of technology to prevent data errors in the first place will cost the business 100x the cost of acquiring an automated technology that can prevent errors in the first place.
In document capture today, we focus on the top rung of the pyramid, and in prevention. Below are the core benefits of an intelligent capture platform:
See more features for insuring high quality document processing and data extraction here: Ephesoft Document Capture and Data Extraction.
Just some thoughts…more to come on this topic.
Ephesoft has just released version 4.1 of our advanced capture platform, with a ton of new features. Below is just a quick list, you can watch the video below for more details:
Video overview of features:
This is part II in a series of videos showing Ephesoft’s Document & Process Machine Learning capabilities (Part I here: Machine Learning and Data Extraction) . In this video, I will show how you can add intelligence to any document capture process through learning external data tables. This allows for leveraging pre-existing ERP and financial system information to make the Ephesoft System smarter.
In my previous post, I outlined some of the premises of machine learning in document capture, and how it can drive unseen levels of efficiency and productivity (See here: Rise of the Machines: Machine Learning and Capture). I always like to follow-up with a video. This is the first of two videos focusing on Ephehsoft’s Machine Learning. This one is focused on end-user input driving document understanding for improved data extraction and less setup time.
I remember when I first started out in the document capture and ECM world, I was sitting across from a CIO, presenting our technology, and he started asking pointed questions about configuration and services. We talked for about 15 minutes, and he stopped, and I could see the gears were turning. He looked at me and said: “Why do you guys make it so damn hard?” I looked at him and said, “What do you mean?” He responded with: “Why all the configuration and setup time? Why cant it just understand my documents, and what I am trying to accomplish? I know that current technology is capable.” At that time, the trend in the industry was a heavy reliance on regular expressions, basically a pattern matching language that originated in 1956, born through mathematical theory. So essentially, the CIO hit the nail on the head: We were using 1950s math theory to provide automation and value, but it came with a deep cost in the form of expertise and services. So here we are 10 years later, and the majority of the industry still uses that same method to analyze, classify and extract data.
In the document automation space, we typically present a magic world to the end users, one where they just hit the button, or upload their document, and stuff just happens “automagically”. But in reality, behind the scenes, there was a lot of work to get to that point. With the burden placed on IT in the form of education, configuration, service costs and testing. Machine learning strives to eliminate that burden through simple efforts to train the system, and I think the goal, although lofty, is to reduce or eliminate configuration to a point that any user can create a workable system.
So, in document capture, what can machine learning provide? In modern document automation technologies, like Ephesoft’s Capture Platform, machine learning can be leveraged in several ways:
These are just a few examples of machine learning, and what it brings to the document capture industry. More to come when we release our new version, 4.1 at our conference next week.