Infor + Ephesoft: AP Invoice Processing Solution

Infor Ap Automation for invoices

Invoice Scanning and Capture Software for Infor

I thought I would share out this quick video overview of Ephesoft working with Infor M3 and Xi.  It shows an invoice scanning and capture solution, with 3 way PO matching in the Infor M3 product.



Digital Transformation Platform: Accelerating Transformation with Web Services

OCR API Web Services

The Value of Document Capture Web Service APIs

The Ephesoft Web Service APIs provide a valuable set of document capture, classification, extraction and OCR micro-services to “capture enable”” any application.  Need a searchable PDF?  Want to identify a type of document?  Want to extract or validate metadata?

We are seeing a movement in the market where customers want to leverage these functions to provide an invisible automation layer that just happens, and is transparent to users.  You can read more on that phenomena here: The Benefits of Invisible Capture

Below are two videos by our professional services and tech teams that show the capabilities.


Healthcare and Fax Automation: Prior Authorizations

Healthcare scanning and capture

Automating Fax Processing in Healthcare

We recently built out a whole set of Healthcare Capture and Scanning Solutions for our partners.  Below is my contribution, and I focused on automating the Prior Authorization fax problem.


In the coming weeks I will post more of these capture, scanning and OCR solutions here.  Subscribe to get notified.




Machine Learning and Distributed Document Capture and Scanning

Machine Learning for Copier Scanning

Using Copiers to Machine Learn Documents

I have been working with several of our MFP/Copier partners, and wanted to put together a video demo on how to use copiers to train Ephesoft when it comes to our machine learning engine.  This demo shows how you can use our document analytics engine and train HR documents.



CIO Insight: How to Anticipate Your Digital Transformation Roadblocks

Digital Transformation

Digital Transformation (DX) Roadblocks: The Survey

This past week I was at Alfresco’s Sales Kickoff Event in Chicago, and there were some great presentations that tied into my recent theme of Digital Transformation.  In my recent post, CIO’s Digital Transformation Playbook, I outlined the 6 keys to a successful digital transformation.  We all know that road is plagued with pitfalls and detours.  Wouldn’t it be nice to tackle and plan for those roadblocks prior to starting your DX journey?   Here are the top pain points in order (as outlined by Alfresco/Forbes Survey):

  1. Technology (legacy issues/lack of integration) – As CIO’s prepare their vision for DX, they have to figure out their legacy impediments.  Are existing systems cloud ready and cloud adaptive?  How do you replace the laggard technology to enable transformation?
  2. Talent/capabilities/skills gaps – the lack of quality IT talent on the street is a definite issue when it comes to transforming business.  How do you find that Azure or AWS expert?  If you decide to move from an on-prem ERP, can you find an experienced project manager to move to your chosen SaaS vendor?
  3. Lack of collaboration between IT and LoB – ah, the same old story, the struggle between IT and the Business Users.  Cant we all just get along?  DX can be a painful and scary trip for the business, and moving to new technology or a new system can wreak havoc on operations.  Collaboration between tech and business can slow down the move, and may even bring it to a halt.
  4. Organizational Silos – digital transformation is broad and all-inclusive, and breaking down silos to enable a true digital “flow” between departments can be a heavy challenge.  This ties into #3, and now IT becomes a facilitator between departments internally.
  5. Disjointed processes across departments – recreating broken processes in the digital realm is a recipe for failure and challenges.  Process improvement efforts during DX can add layers of complexity, and also slow down efforts.
  6. Company Culture – I’ll never forget a conversation I had with a CEO in my past, we were discussing paperless technology, and he said, “I can see all the benefits and the cost savings, but my employees will never go for that.”  What?!?!  Company culture and attitudes can be a massive inhibitor when it comes to transformative technology.

Alfresco has some great posts on Digital Transformation here:

Digital Transformation: It’s all about the flow

Digital Transformation Isn’t a Goal.  It’s A Journey.

Salesforce and Ephesoft: In App Document Classification and Extraction

Salesforce OCR and Automation

Web Services for OCR, Data Extraction and Document Classification

Continuing on my themes of open, web service enabled document capture and analytics, as well as this notion of “In App” Document Capture through APIs, I thought I would share out a demo one of our fantastic SEs built to show the automation capabilities within Salesforce.  This shows background document classification and extraction, all initiated through a file upload in Salesforce.  This leverage OCR technology and our machine learning algorithms to auto-populate data in Salesforce.


Document Analytics: Machine Learning and Document Dimensions Video

Document Analytics and Machine Learning

Document Capture Through Intelligent Learning and Analytics

One of our regional reps produced this video to help show how we differ from other document capture and analytics platforms on the market.  This is a great expansion to one of my earlier posts – Analytics and Document Capture – Why it Matters  The video gives a great overview on the many dimensions of a document, and how Ephesoft leverages its patented technology to enhance accuracy, analyze large volumes of documentation, and process unstructured information.

Ephesoft Transact 4.1: New Document Capture Features

OCr, Scanning and Capture Features

Intelligent Document Capture and Scanning

Ephesoft has just released version 4.1 of our advanced capture platform, with a ton of new features.  Below is just a quick list, you can watch the video below for more details:

Accuracy in Capture Enhancements

  • Enhanced Interactive Machine Learning
  • Paragraph Data Extraction
  • Multi-dimensional Classification
  • Enhanced Table Extraction
  • Cross Section Data Extraction
  • Progressive Barcode Reader

Productivity in Capture Enhancements

  • Auto-regex Creation
  • Line Item Matching (ERP integration)
  • Fuzzy Database Redesign
  • Format Extraction

Connectivity in Capture Enhancements

Security in Capture Enhancements

  • HTML 5 Web Scanner Service
  • Cluster Configuration Enhancements
  • Data Encryption in Linux
  • Single Sign On – SAML v2
  • PIV/CAC Authentication

Video overview of features:


Document Analytics and You: Why It Matters in Capture

The Many Dimensions of Your Documents

I had dinner the other night with our CTO and our conversation was focused on why our technology was different, and what our Document Analytics patent brought to the table.  Having been in the document capture industry for a long while, I was familiar with most of the technological “advances”.  Things like automated extraction, pattern matching and classification through a variety of methods.  I also knew with the acquisition craze, that much of the technology was for all intents and purposes, completely stagnant, with no true innovation for quite a while.  In steps Document Analytics (DA).

First off, document analytics is part of Ephesoft’s core learning engine technology.  So no matter your use case, the engine’s learning, multi-dimensional analysis and gleaned information are all available for use either through our end-user document capture application, platform APIs or our document analytics platform.  So lets see what DA truly means.

Traditional document capture applications take digital documents, and if need be in the case of scanned images, do a conversion of the image to text through the process of Optical Character Recognition (OCR).  This raw text can then be examined, and usually information is extracted based on simple location or a pattern match.   This technique is great for simple data extraction, but when it comes to unstructured documents, without additional information, it can lead to erroneous data, and it can be quite limiting if you want a deeper understanding of your documents and data.

With document analytics, the goal is to gather multiple dimensions on not only your documents, but also what lies within.  As you feed the learning system more documents, it learns continuously, and begins to gain understanding and predicting where key information is located.  So lets look at all the dimensions a true document analytics engine can gather.  Hold on, we are going full geek here.

document analytics and capture
Document Analytics Learns Document Dimensions
  1.  Root-stemming – most technology in the market looks at individual words on a page.  True meaning comes in groups of words, and the analysis of their roots.  Take for example  an analysis of mortgage documents.  the term borrower becomes extremely important, but unfortunately, when extracting data or searching for core data, you may encounter multiple forms: borrower,  name of borrower, borrowed by, borrowing party, etc.  Being able to identify a core root, borrow, and being tolerant of variations becomes extremely important, as does the ability to assign a confidence level to identified groups.
  2. Relative Positions – gathering relative position information about words within a document can provide great insight.  As we continue our borrower example, knowing that in the phrase “Borrowing Name” that name follows borrowing gives us insight, and helps in our quest for valuable data.  Once again, this adds to our confidence that our data is being collected correctly.
  3. Typographical Characteristics – understanding font, font size and other characteristics of words, can help us understand document structure.  For example, a fillable PDF form we download from our medical insurance company will have a font for all the anchors: Patient Name, SSN, Address, etc.  When we fill this form out, we enter our information with another font, perhaps in all capitals.  This minor difference can provide meaning, and a better understanding of the structure of a document.
  4. Value Identity – when analyzing documents, knowing conventions in data can aid in dimensional analysis.  Take for example the social security number standard: NNN-NN-NNNN.  Knowing this pattern, and using other dimensions, like position, can help us “learn” about documents.  How?  So, when we find this pattern on the page, we can look before it and above it to understand how it is identified.  It would be prefaced by SSN, SSN:, Social:, SS Number, etc.  Once we understand how SSNs are anchored, now we can understand how other data may be anchored as well.
  5.  Imprecision and Fuzziness – people are not perfect and neither is technology, and DA requires adaption to “imperfect” information.  Take a zip code that is entered or read as 9H010.  Well, we know that this data was prefaced by “Zip Code”, and we know they should be 5 numerical digits.  We also know that an OCR engine can sometimes confuse a 4 and H depending on font type.  Getting the drift here?  By taking all our dimensions into account, we can say: this was locationally after “Zip Code:”, it is 5 characters, and I know sometimes 4 and Hs can be interchangeable for this font type.  Therefore I can say with 90% confidence this is in fact a zip that had been misread or mis-entered.
  6. Value Quantization – in gathering data, we know that certain words are most likely data that we will find interesting.  Numbers  (whole, 69010, or character delimited, 123-45-6789), dates (01/12/2001), and so forth are likely to be interesting data values that we need to extract, or will be required in our analysis.  Taking this into account can help our confidence and accuracy.
  7. Page Zones – in examining a document, certain areas, or zones, of a page usually contain important information.  For instance, a contract will almost always have key participants in the top quarter of the first page.  An invoice will have its total amount in the bottom half.  Using this area analysis can help us identify key information, and add to our confidence in data extraction.
  8. Page Numbers – As a human, I know that much of the important information will be on specific pages in documents I access on a day-to-day basis.  But maybe a certain type of application has key information on page 3.  Understanding and identifying core pages with critical data will provide added insight and aid in analytics.
  9. Fixed Value Location – one key in learning documents is to examine and define text blocks of interest.  Once these page areas are defined, the system can better understand layout and the design of document, and help predict where key data may be located.

This is just an overview of how we can make sense of unstructured information through advanced learning and analytics.  If you want to go deeper, you can read the patent here:

Analytic systems, methods, and computer-readable media for structured, semi-structured, and unstructured documents