Appian and Intelligent Capture: Onramping to Workflow

OCR and Extraction for BPM

OCR, Classification and Extraction for Appian

I spent this week at the AppianWorld Conference with one of our great partners, GxP Partners.  GxP has done some really cool things with Ephesoft Transact, through leveragingCapture and OCR for Appian our open capture platform and OCR & Extraction web services to create ApnCapture.   The solution is tightly tied to a previous article I wrote Document Capture+BPM, and below
is a quick summary of the value of using intelligent document capture with any workflow tool:

With the rise of Document Capture as a Platform (CaaP), there is an enormous opportunity for organizations to leverage the power of capture as an intelligent document automation component to any business process or workflow solution.  Here are the core use areas of document capture and automation with any Business Process Management System (BPMS):

  1. The “Pre” – The logical fit is to use document capture software to “feed the beast”, or in other terms, as a front-end processor for inbound documents destined for workflows.  You might ask, “Why?  My BPM/Workflow solution has the capability to import documents.”  Modern capture platforms add another dimension of automation through the use of several features like separation, document classification and automatic data extraction.  Imagine a mortgage banking process where a PDF document is sent inbound that houses  12 different document types in a single PDF file.  The power of capture is to auto-split the PDF, classify each document, extract information and then pass all of that in a neatly formatted packages to the workflow engine.   Now, the workflow has a second dimension of intelligence, and it can use that to branch, route and execute.  Platforms like Ephesoft Enterprise have the ability to ingest documents from email, folders, legacy document management systems, fax and also legacy capture (like Kofax and Captiva).

2.  Mid-stream –  What about activities during the workflow?  Ones that are necessary mid-process?  This is where the true power of a “platform” comes into play, and it requires a web services API (See other requirements of a Capture Platform in this article: 6 Key Components of a Document Capture Platform).   Some examples of activities that can be accomplished through a capture platform API in workflow:

  • Value Extraction – pass the engine a document and return extracted information.
  • Read Barcodes – pass the engine an image, and read and return the value of a barcode.
  • Classify a Document – pass a document and identify what it is
  • Create OCR – pass a non-searchable PDF and return a searchable file.

As you can imagine, this can provide extreme customization in any process that requires document automation, and can reduce end-user input, create added efficiency, and once again add that second dimension of intelligence after the workflow has begun.  You can see an extensive list of API operations here: Document Capture API Guide

3.  The “Post” – Depending on the process and requirements, a “post-process” capture may be in order.  Most capture platforms have extensive integrations with 3rd party ECM systems like SharePoint, Filebound and Onbase, and can be leveraged as an integration point to these systems.  In addition, there is a new wave in the big data and analytics world, with a focus on data contained within documents.   Routing documents and data to analytics repositories can help organizations glean important insight into their operations.   If you choose a capture platform with a tied-in document analytics component, this can be accomplished automatically.

ApnCapture: Capture and OCR for Appian

So, how did GxP implement ApnCapture and integrate with Appian?  Below is a series of screen shots as an overview from start to finish:

The capture process is initiated from any document source Ephesoft supports:

  • Web-browser scanning
  • Copiers/MFPs
  • Network folders
  • Email + Attachments
  • Mobile (Through our mobile client SnapDoc: Mobile OCR and Capture)
  • CMIS Based Repositories
  • Custom Code

Those documents are processed, and if there are no confidence issues, they pass right through the process.  If there are issues that require end-user correction or validation, users can access document batches through the ApnCapture Batch Report.

Appian Document Data Extraction and Classification OCR
Document Batches Are Queued for End User Review

Clicking any line in the Appian produced form interacts with Ephesoft Web Services to open a validation and review screen.

Appian Capture and OCR Review Screen
Documents Can Be Reviewed Through the ApnCapture Web Interface

Extracted data is then sent into Appian and can be used for all types of purposes: adding intelligence to workflows, enhancing business rules with data, and leveraging documents for approval and review.

AP Invoice Solution with OCR for Appian
Extracted Invoice Data is Presented in Appian for Approval

Finally, all the extracted data can provide a deeper view of any process that is capture enabled.

Accounts Payable Process in Appian Dashboards
Captured Invoice Data is Presented for a Deeper Look Into the AP Process

To find out more about Intelligent Capture and OCR for Appian, contact GxP Partners.

 

CIO Insight: How to Anticipate Your Digital Transformation Roadblocks

Digital Transformation

Digital Transformation (DX) Roadblocks: The Survey

This past week I was at Alfresco’s Sales Kickoff Event in Chicago, and there were some great presentations that tied into my recent theme of Digital Transformation.  In my recent post, CIO’s Digital Transformation Playbook, I outlined the 6 keys to a successful digital transformation.  We all know that road is plagued with pitfalls and detours.  Wouldn’t it be nice to tackle and plan for those roadblocks prior to starting your DX journey?   Here are the top pain points in order (as outlined by Alfresco/Forbes Survey):

  1. Technology (legacy issues/lack of integration) – As CIO’s prepare their vision for DX, they have to figure out their legacy impediments.  Are existing systems cloud ready and cloud adaptive?  How do you replace the laggard technology to enable transformation?
  2. Talent/capabilities/skills gaps – the lack of quality IT talent on the street is a definite issue when it comes to transforming business.  How do you find that Azure or AWS expert?  If you decide to move from an on-prem ERP, can you find an experienced project manager to move to your chosen SaaS vendor?
  3. Lack of collaboration between IT and LoB – ah, the same old story, the struggle between IT and the Business Users.  Cant we all just get along?  DX can be a painful and scary trip for the business, and moving to new technology or a new system can wreak havoc on operations.  Collaboration between tech and business can slow down the move, and may even bring it to a halt.
  4. Organizational Silos – digital transformation is broad and all-inclusive, and breaking down silos to enable a true digital “flow” between departments can be a heavy challenge.  This ties into #3, and now IT becomes a facilitator between departments internally.
  5. Disjointed processes across departments – recreating broken processes in the digital realm is a recipe for failure and challenges.  Process improvement efforts during DX can add layers of complexity, and also slow down efforts.
  6. Company Culture – I’ll never forget a conversation I had with a CEO in my past, we were discussing paperless technology, and he said, “I can see all the benefits and the cost savings, but my employees will never go for that.”  What?!?!  Company culture and attitudes can be a massive inhibitor when it comes to transformative technology.

Alfresco has some great posts on Digital Transformation here:

Digital Transformation: It’s all about the flow

Digital Transformation Isn’t a Goal.  It’s A Journey.

CIO Playbook: 6 Keys to Digital Transformation Success

Digital Transformation Paperless

6 Core Keys for Digital Transformation Journey Success

I finally built out my Digital Transformation presentation deck, and recorded the video overview below.   This is an extension of my initial post:  BofA and Digital Transformation

 

GDPR and Compliance: Are Documents the Enterprise Minefield?

What is your privacy strategy for documents and content repositories?

The new General Data Protection Regulation (GDPR) is set to replace the older Data Protection Directive in the EU on May 25, 2018.  This new roll out of privacy protections for EU nations has broad and expansive implications for any company within the realm of the EU, or those that process EU citizen information and data.   Here is a summary of the major changes:

  • GDPR jurisdiction now applies to all organizations that process EU subject personal data, regardless of the
    Data Protection finesorganizations location.
  • Breach of GDPR can be fined up to 4% of global turnover or 20M Euros (whichever is larger)
  • Consent when providing personal information must be clear and easy to understand.

There are a set of core subject rights that apply, and below is a quick summary:

  • Breach Notification – any data breach requires notification within 72 hours.
  • Right to Access – subjects can request an electronic copy of all private data at any time.
  • Right to be Forgotten – aka Data Erasure, a subject at any time can request to have all private data removed from a controlling organizations systems.
  • Data Portability – subjects can request to have their information transferred to another organization at any time.  This will go hand in hand with the “right to be forgotten”.
  • Privacy by Design – now a legal requirement, organizations must show proof of “…appropriate technical and organizational measures…” within any system or process.
  • Data Protection Officers (DPOs) – organizations will now require DPOs.  This individual will be responsible for interfacing with EU nations and authorities, and will carry the heavy burden of responsibility for all data protection efforts.

So, with that quick outline, imagine the implications of  millions of application documents with personal information that are breached.    What about the accidental scan of medical records to an insecure document sync folder?  Or the directory of millions of scanned documents that have a few documents with private information?

Organizations need a two-pronged approach to prevent the document minefield.   So, to get this under control, and mitigate risk, there are really two types of technologies that need to work hand in hand.

GDPR Compliance Solution
GDPR Document Strategy: Transactional and Analytical

First, a document and content capture technology that works as an ingestion point for new content and existing document-centric processes.  This form of enterprise input management can be placed as an non-invasive automation layer to flag/identify suspect content and provide reporting capabilities around private information for compliance.  Once again, focused on day forward transactions.

Second, is a solution to crawl existing repositories to classify, extract and identify documents that pose a risk.  This technology can work hand in hand with the transactional layer to build machine learning profiles, and establish analytical libraries of  document and data profiles so the analytical side can become proactive and preemptive.  This can be a critical step in identifying possible legacy documents that house private information that could be subject to GDPR fines.

So, where does Ephesoft fit?  We have two products that span the transactional and analytical requirements to help organizations capture, classify, identify and visualize their documents in a broad sense, and comply with GDPR privacy rules.

GDPR Compliance for Documents
Ephesoft Provides GDPR and Privacy Solutions for Documents

For the day-to-day, we have Ephesoft Transact, and for deep analytics, we have Ephesoft Insight.   If you need further information, you can contact us here: Ephesoft GDPR Solution Information.

 

The Borderless Enterprise, The Cloud and Capture 2.0

Cloud Document Content and Capture

Capture is the New Intelligent Document Transport Layer

 

As Enterprise infrastructure gets more and more complex, especially with a move to cloud content and line of business systems, organizations struggle with creating what I will call an “Intelligent Document Transport” layer.  The ability to move documents from system to system and maintain data integrity and standardization is paramount to driving organizational efficiency.  With 72% of larger organizations having 3 or  more repositories, and 25% having 5 or more, allowing a seamless interchange of documents and data seems more like a dream than an actual reality. Cloud Capture for Content In addition, legacy, in-place capture systems just lack the modern web service oriented architecture to allow the adaptability and flexibility required to work with modern cloud infrastructure.  These “Fat” client applications are often laden with complex, host-based SDKs and legacy code, requiring extensive development cycles and specialized skill sets to extend and integrate. Here are some of the core challenges in organizations lacking this transport/integration layer:

  • Lack of Document “Intelligence” – many organizations move documents throughout their systems as a closed entity.  They may know it is  a PDF or a Word document, and that it came from accounting, but beyond that, it is a digital mystery.  They usually have limited data or information, and this usually requires human intervention or hand keying of info.
  •  Lost in Translation – As documents move from department to department, person to person and system to system, things get lost in translation.  Information may be misinterpreted, data may be lost or the interpretation may be different.
  • Lack of Standardization and Normalization – With out a standardized transport layer, problems begin to arise.  Take this simple example:  The difference in file naming.  Maybe one department calls it W4, another W_4, and yet a third W-4.  As documents flow back and forth, between systems, think of the headaches this minor difference can create in reporting, workflow and overall system operation.
  • Unified Security – the ability for users and integrations to span the on premise world and the cloud ether is reliant on complex authentication and authorization.  In this day and age, having centralized reporting and audit capability on document transactions can be critical, and a single sign on capability required.

So, what is required to eliminate these challenges and create an efficient document transport layer to connect people, departments and systems?  In my previous post about the new keys to digital transformation, companies are realizing the benefits of new age application architecture: open modular platforms, cloud adaptive technology, scale up and scale down and rapid deployment.  New age document capture and analytics platforms, like Ephesoft Transact, encompass these modern traits, and help create a smooth and efficient document transport layer through the following:

  • Bundling both the document and metadata in an intelligent “suitcase”.  When documents enter the capture layer, they are immediately classified into document types, and appropriate data is extracted.  All of this information travels with the document until it reaches its destination, and the document and data are translated into the required format.
  • Breaking down the barriers that exist between on premise cloud systems.  With a platform built for cloud adaptation, there are now no barriers between systems in corporate data centers and cloud based services.  New age capture platforms can now reside anywhere, and inter-operate with all types of repositories and applications.
cloud document flow
Document Transport Enables a “Borderless” Enterprise
  • Creation of standard processing workflows and business rules.    Creating repeatable processes that are standardized regardless of the user, device or system reduce errors and streamline operations.  Document processing becomes predictable, more efficient and agile when the need for change arises.
  • Security is enforced and an audit trail created.  With a single system that is the epicenter of document traffic, all transactions can be tracked and logged.  With authentication that spans all systems (through single sign on), access can be granted to only documents and systems that are in a users security realm.

New age capture technologies breakdown the barriers that exist, and create a “borderless” Enterprise, and allow the exchange of documents and their associated data to enable improved efficiency and productivity.  Thoughts?

 

 

 

 

 

 

 

 

SharePoint and OCR 2.0: Out with The Old

Sharepoint optical character recognition

Using Adaptive OCR Technology & Analytics to Drive SharePoint Efficiency and Adoption

Optical Character Recognition technology, or OCR, has been around for quite some time.  It really became mainstream back in the ’70s when a man named Ray Kurzweil developed a technology to help the visually impaired.    He quickly realized the broad commercial implications of his invention, and so did Xerox, who purchased his company.   From there, OCR experienced broad adoption across all types of use cases.

At its simplest,OCR is a means to take an image and convert recognized characters to text.  In the Enterprise Content Management (ECM) world, it’s this technology that provides a broad range of metadata and content collection methods as documents are scanned and processed.   Here are the basic legacy forms of OCR that can be leveraged with SharePoint:

  • Full Text OCR – converts the entire document image to text, allowing full text search capabilities.  Using this OCR with SharePoint, documents are typically converted to an Image+Text PDF, which can be crawled, and the content made fully searchable.
  • Zone OCR – Zoning provides the ability to extract text from a specific location on the page.  In this form of “templated” processing, specific OCR metadata can be extracted and mapped to a SharePoint column.  This method is appropriate for structured documents that have the data in the same location.
  • Pattern Matching OCR – pattern matching is purely a method to filter, or match patterns within OCR text.  This technique can provide some capabilities when it comes to extracting data from unstructured, or non-homogeneous documents.  For example, you could extract a Social Security Number pattern (XXX-XX-XXXX) from the OCR text and map it to a SharePoint column.

These forms of OCR are deemed as legacy methods of extraction, and although they can provide some value when utilized with any document process that involves SharePoint, they are purely data driven at the text level.

In steps OCR 2.0.  Today, innovators like Ephesoft leverage OCR as the very bottom of their document analytics and intelligence stack.   The OCR text is now pushed through algorithms that create meaning out of all types of dimensions: location, size, font, patterns, values, zones, numbers, and more (You can read about this patented technology here: Document Analytics and Why It Matters in Capture and OCR ).  So rather than just being completely data-centric, or functioning at the text layer, we now create a high-functioning intelligence layer that can be used beyond just text searching and metadata.  And the best part?  This technology has been extended to non-scanned files like Office documents.   Examples?  See below:

  • Multi-dimensional Classification – using that analysis capability (with OCR as algorithm input), and all the collected dimensions of the document, document type or content type can now be accurately identified.  As documents are fed into SharePoint, they can be intelligently classified, and that information is now actionable with workflows, retention policies, security restrictions and more.  You can see more on this topic in this video on Multi-dimensional Classification Technology: Machine Learning and Classification of Documents
  • Machine Learning – legacy OCR technology provided no means or method to “get smarter” as documents were processed.  Just looking at pure text, it either recognized it, or not.  With a machine learning layer, you now have a system that gets more efficient the more you use it.   The key here is that learned intelligence must span documents, it cannot be tied to any one item.  It’s this added efficiency that can drive SharePoint usage and adoption through ease of use.  You can see more on machine learning in the videos below:

Machine Learning and OCR Data Extraction

Machine Learning and External Data

  • Document Analytics, Accuracy  and Extraction – with legacy OCR, extracting the information you need can be problematic at best.  How do you raise confidence that the information you have is accurate?  With an analysis engine, we look not just at the text,  but where it sits, what surrounds it, and know patterns or libraries.  This added layer provides the ability to express higher confidence in data extraction, and makes sure you are putting the right data into SharePoint.

This was just a quick overview of the benefits from moving away from legacy OCR, and embracing OCR 2.0 for SharePoint. Thoughts?

 

 

 

Salesforce and Ephesoft: In App Document Classification and Extraction

Salesforce OCR and Automation

Web Services for OCR, Data Extraction and Document Classification

Continuing on my themes of open, web service enabled document capture and analytics, as well as this notion of “In App” Document Capture through APIs, I thought I would share out a demo one of our fantastic SEs built to show the automation capabilities within Salesforce.  This shows background document classification and extraction, all initiated through a file upload in Salesforce.  This leverage OCR technology and our machine learning algorithms to auto-populate data in Salesforce.

 

Document Analytics: Machine Learning and Document Dimensions Video

Document Analytics and Machine Learning

Document Capture Through Intelligent Learning and Analytics

One of our regional reps produced this video to help show how we differ from other document capture and analytics platforms on the market.  This is a great expansion to one of my earlier posts – Analytics and Document Capture – Why it Matters  The video gives a great overview on the many dimensions of a document, and how Ephesoft leverages its patented technology to enhance accuracy, analyze large volumes of documentation, and process unstructured information.

Capture, Data Quality and the 1-10-100 Rule

The True Cost of Bad Data

In the world of document capture and analytics, our typical value proposition is around efficiency, reduction in required headcount and the reduction in turnaround time.  Of course, there is true value and cost savings for any organization processing a significant volume of documents if you focus on these value points.  Lately, we have been having some great conversations both internally and externally on the true cost of errors in data entry, and I wanted to dig deep into my past, and present a key topic for discussion.

Back in my Navy days, I found myself in the center of a focus on quality, and we had morphed Deming’s Total Quality Management (TQM) into a flavor that would serve us well.  In a nutshell, it was an effort to increase quality through a systematic analysis of process, and a continuous improvement cycle.  It focused on reducing “Defects” in process, with the ultimate goal of eliminating them all together.  Defects impose a high cost on the organization, and can lead to failures across the board.  Today, all these concepts can be applied to the processing of documents and their associated data.  What is the true value of preventing defects in data?

In my education in this topic, I remember a core concept on quality, and defects: the 1-10-100 rule.

Document Capture and OCR

The rule gives us a graphic representation of the escalating cost of errors (or failures), with prevention costing a $1, correction $10 and failure $100.  So, in terms of data:

  • Prevention Cost – Preventing an error in data at the point of extraction will cost you a $1.
  • Correction Cost – Having someone correct an error post extraction will cost you $10.
  • Failure Cost – Letting bad data run through a process to its end resting place will cost you $100.

So, an ounce of prevention is worth a pound of cure.  In this case, the lack of technology to prevent data errors in the first place will cost the business 100x the cost of acquiring an automated technology that can prevent errors in the first place.

In document capture today, we focus on the top rung of the pyramid, and in prevention.  Below are the core benefits of an intelligent capture platform:

  • Errors in data can be prevented through the automated extraction of data, setting of business rules, and the elimination of hand-keying data.
  • Existing data sources in the organization can be used to enhance prevention, and insure data validation through the use of Fuzzy DB technology.
  • Adding review and validation capabilities prevent bad data from slipping through the process and contaminating your data store.  This is invaluable and prevents the ripple effect bad information can have on a  process and the organization as a whole.
  • With machine learning technology, if correction is required, the system learns, and can prevent future corrections, reducing costs.

See more features for insuring high quality document processing and data extraction here:  Ephesoft Document Capture and Data Extraction.

Just some thoughts…more to come on this topic.