Contract Management: Ephesoft and SharePoint Online

SharePoint Contract Management

Capturing Contract Data for Analysis

We have had several requests recently to show how we can help in processing contracts and extracting metadata.  The below video uses Ephesoft Transact in two ways to process contracts:

  1.  Extracting historical contract data for analysis.  In example one, we utilize Ephesoft to import  contract PDFs, classify them, and then extract pertinent data for routing to a SharePoint Contract library.
  2.  Routing and archiving new, inbound contracts.  This example brings in contracts from email, folders and other sources and classifies them, the places the contracts and data within a SharePoint Contract library.

Here is the overall Contract Management Solution:

 

 

Advertisements

Unstructured Data and the Cloud: The Benefits of Capture as a Service

Document Capture & Analytics in the Cloud

We launched the first fully functional Capture as a Service (CaaS) offering in Microsoft Azure this week at the Microsoft Inspire Partner Conference.  We were helped along the way by one of our larger partners, that had high demand for Capture as a Service, and we were seeing more and more requests for the intelligent processing of unstructured data in the Cloud.   Below are some core benefits of our cloud offering:

Cloud OCR Services

Time to Value – On-premise software implementations can be a long-term journey and require additional budget for hardware, IT resource time and long budget cycles for capital expenditure approvals.  With the Ephesoft Transact Cloud, your time to value is minimized and your intelligent capture platform can be up and running in a fraction of the time.  Want to see how you can calculate a quick ROI on CaaS?  See our recorded webinar here: The Ephesoft Effect

 

Cloud Scanning Services

Cost of Ownership – Software as a Service (SaaS) reduces the overall solution cost of ownership by including support, and eliminating the need for hardware, backups, monitoring, dedicated administration and overall management.  By including these costs in one recurring fee, complexity and overhead are reduced, and IT spend becomes more predictable.

 

Competitiveness – No longer is intelligent document capture only for large organizations Cloud Document Data Extractionwith an army of IT folks.  Now smaller organizations can have access to enterprise-class technology, and glean the all the advantages and efficiency to stay competitive, and challenge their larger rivals.

 

Gartner estimates that the annual cost of owning and managing software applications can be as much as four times the cost of the initial purchase.

 

SDK and APIs For Cloud Scanning and CaptureAccess to Innovation – with SaaS, access to the latest and greatest software is included.  As Ephesoft improves its processing engine, you can immediately take advantage of the added efficiency.   Your subscription provides continuous value, and appreciates over time as more features and functionality are added.

 

Azure OCR ServicesScalability and Agility – the Ephesoft Transact Cloud is built for maximum scalability and agility.  You can easily add more cores, features and processing power, depending on your requirements and needs.  You can start small, and grow with our flexible subscription model.

 

Cloud Document AnalyticsCapture Anywhere – with the Ephesoft Cloud provides intelligent document capture anywhere, on any device.  You can automate document processes with a browser, a smart phone or a tablet.  This allows todays distributed workforce to access all the benefits Ephesoft can provide.

 

Read more on our offering in the Cloud here:  Unlocking Unstructured Data: Document Capture, OCR and Scanning in the Cloud

 

Using Ephesoft to Add Intelligent Automation to Microsoft Technologies

OCR and Automation for SharePoint

Ephesoft Automation for SharePoint, Azure, BI, Flow and Dynamics

We are ramping up our team for the Microsoft Inspire Conference (Booth 1237) in Washington, DC in a few weeks (July 9-13), and I thought I would put together some ideas on Microsoft Classificationthe power of Ephesoft technology when combined with Microsoft technologies.  We have been working with several Microsoft Teams (Azure, SharePoint, Flow) to bring solutions to market, and provide extensive document-centric solutions to their partner and customer ecosystem.  So how do we fit?  I will outline a quick primer.

 

Just Who Is Ephesoft?

Ephesoft was founded in 2010 by leaders from the document capture industry that wanted to drive innovation and disrupt the legacy document automation space.  The company has shown explosive growth through its unique perspective on taming unstructured content using patented complex analytics and machine learning.   Its technology has garnered broad interest, and investment from top-tier firms like Fujitsu and In-Q-Tel.

Just What Does Ephesoft Technology Do?

At the heart of Ephesoft Technology is an engine that provides automated document classification and data extraction.  Feed it documents from any source (fax, scanners, copiers, folders, legacy ECM systems, mobile devices, repositories) and it will do all the heavy lifting –  sorting, separating, classifying and getting you the data you need to drive efficiency, productivity, automation and decision-making with minimal end-user intervention.  Providing SaaS and PaaS solutions, and available on premise or in the cloud, the Ephesoft platform can provide great value to any size organization.  Ephesoft has two products:

Ephesoft Transact – a transaction document capture platform for day-to-day document processing.

Ephesoft Insight – a document analytics platform for ingesting large volumes of existing unstructured content and extracting meaning.

How Does Ephesoft Fit With Microsoft?

Think of Ephesoft as an added intelligent document automation layer that can be placed on top of other technologies as a catalyst for automation.  Below is a list of core technologies from Microsoft, and how Ephesoft can fit from a business perspective.

Microsoft SharePoint and Ephesoft

With SharePoint, Ephesoft Transact can be an intelligent on ramp for documents into SharePoint libraries.   As a front end loader, Transact can auto-identify and route documents from just about any source, and make sure they wind up in the right library, as a searchable PDF, with all the important metadata extracted.   It provides a standardized, repeatable process for adding any type of document to Microsoft SharePoint.

With Ephesoft Insight, SharePoint libraries can now be consumed and leveraged for Document Analytics.  Insight provides the “document side” of the analytics equation.

You can get more information here:

Ephesoft/SharePoint Integration

Email Classification with SharePoint

Microsoft Flow and Ephesoft

Utilizing Ephesoft Web Services in the cloud, you can add intelligence to any Microsoft Flow workflow.   Using the classification or extraction services, you can use Ephesoft Transact technology to “open up” documents mid-process, and make workflow branching decisions based on what you find.   An example of a Flow use case here:

Ephesoft and Microsoft Flow with SharePoint Online

Scanning to Microsoft Flow

Microsoft Dynamics and Ephesoft

ERP and Accounting systems can leverage the power of Ephesoft in many different ways.  As a processing engine, Ephesoft Transact can extract information from critical documents, like invoices or sales orders, and pass the information on to Dynamics.  No longer will employees have to hand key information, and waste precious time.  Along with time savings, data entry errors can now be eliminated through Ephesoft Transact’s validation and exception processing capabilities.  More info:

Ephesoft Accounting ERP Solutions

Microsoft Azure and Ephesoft

Document capture and automation is a great fit for the cloud.  Ephesoft’s web-based technology and RESTful APIs are cloud ready, and are available in Microsoft Azure.  As a Cloud Infrastructure partner, Ephesoft has worked diligently to insure compatibility with Azure, and also to take advantage of all the cloud has to offer from a scalability and availability perspective.  Read more on Ephesoft’s cloud platform:

Ephesoft Capture in the Cloud

This is just a short list of possibilities.  Ephesoft’s products are built for partners, and have an open architecture to facilitate the building of portable solutions to add value and drive revenue.  Come see  us at Inspire (Booth 1237), or reach out to us directly for more information: Contact Us.

 

Machine Learning and Distributed Document Capture and Scanning

Machine Learning for Copier Scanning

Using Copiers to Machine Learn Documents

I have been working with several of our MFP/Copier partners, and wanted to put together a video demo on how to use copiers to train Ephesoft when it comes to our machine learning engine.  This demo shows how you can use our document analytics engine and train HR documents.

 

Thoughts?

GDPR and Compliance: Are Documents the Enterprise Minefield?

What is your privacy strategy for documents and content repositories?

The new General Data Protection Regulation (GDPR) is set to replace the older Data Protection Directive in the EU on May 25, 2018.  This new roll out of privacy protections for EU nations has broad and expansive implications for any company within the realm of the EU, or those that process EU citizen information and data.   Here is a summary of the major changes:

  • GDPR jurisdiction now applies to all organizations that process EU subject personal data, regardless of the
    Data Protection finesorganizations location.
  • Breach of GDPR can be fined up to 4% of global turnover or 20M Euros (whichever is larger)
  • Consent when providing personal information must be clear and easy to understand.

There are a set of core subject rights that apply, and below is a quick summary:

  • Breach Notification – any data breach requires notification within 72 hours.
  • Right to Access – subjects can request an electronic copy of all private data at any time.
  • Right to be Forgotten – aka Data Erasure, a subject at any time can request to have all private data removed from a controlling organizations systems.
  • Data Portability – subjects can request to have their information transferred to another organization at any time.  This will go hand in hand with the “right to be forgotten”.
  • Privacy by Design – now a legal requirement, organizations must show proof of “…appropriate technical and organizational measures…” within any system or process.
  • Data Protection Officers (DPOs) – organizations will now require DPOs.  This individual will be responsible for interfacing with EU nations and authorities, and will carry the heavy burden of responsibility for all data protection efforts.

So, with that quick outline, imagine the implications of  millions of application documents with personal information that are breached.    What about the accidental scan of medical records to an insecure document sync folder?  Or the directory of millions of scanned documents that have a few documents with private information?

Organizations need a two-pronged approach to prevent the document minefield.   So, to get this under control, and mitigate risk, there are really two types of technologies that need to work hand in hand.

GDPR Compliance Solution
GDPR Document Strategy: Transactional and Analytical

First, a document and content capture technology that works as an ingestion point for new content and existing document-centric processes.  This form of enterprise input management can be placed as an non-invasive automation layer to flag/identify suspect content and provide reporting capabilities around private information for compliance.  Once again, focused on day forward transactions.

Second, is a solution to crawl existing repositories to classify, extract and identify documents that pose a risk.  This technology can work hand in hand with the transactional layer to build machine learning profiles, and establish analytical libraries of  document and data profiles so the analytical side can become proactive and preemptive.  This can be a critical step in identifying possible legacy documents that house private information that could be subject to GDPR fines.

So, where does Ephesoft fit?  We have two products that span the transactional and analytical requirements to help organizations capture, classify, identify and visualize their documents in a broad sense, and comply with GDPR privacy rules.

GDPR Compliance for Documents
Ephesoft Provides GDPR and Privacy Solutions for Documents

For the day-to-day, we have Ephesoft Transact, and for deep analytics, we have Ephesoft Insight.   If you need further information, you can contact us here: Ephesoft GDPR Solution Information.

 

Document Analytics: Machine Learning and Document Dimensions Video

Document Analytics and Machine Learning

Document Capture Through Intelligent Learning and Analytics

One of our regional reps produced this video to help show how we differ from other document capture and analytics platforms on the market.  This is a great expansion to one of my earlier posts – Analytics and Document Capture – Why it Matters  The video gives a great overview on the many dimensions of a document, and how Ephesoft leverages its patented technology to enhance accuracy, analyze large volumes of documentation, and process unstructured information.

Breaking Down Document Silos

Opening Repository Silos with Document Capture and OCR

Leveraging Intelligent Capture to Break Down Repository Silos

Every organization has them in both their technical realm and organizational/departmental structure: the Old Silo.  But the elephant in the room is usually the document repository.  That terabyte nightmare no one wants to address for fear of what lies within.   Compounding the issue is the fact that most organizations have numerous document silos, usually the result of years of acquisitions, changing technical staffs with new ideas, or new line of business systems that house their own documents.   Repository silos usually take the form of one of the below:

  • The File Share – when was the last time someone looked at that behemoth?  Usually laden with layer upon layer of departmental and personal folder structures, a complete lack of file naming standards, and a plethora (my $2 word 😉 ) of file types.   They continue to be backed up, and most IT departments that initiate projects for cleaning these up find a minefield, and departments that are fearful to purge anything
  • The Legacy ECM SystemHey, who manages Documentum/FileNet now?  Do I continue to put items in X?  How do I change the metadata in Y?  As time goes on, legacy Enterprise Content Management becomes a huge burden on IT staff, and impose a massive cost burden for maintenance, support and development.   Many of these systems were put in place a decade or so ago, and the file tagging and metadata needs have changes, with users struggling to find what they need through standard search.  Some of these systems have just become expensive file shares, due to lack of required functionality or non-supported features.
  • The Line of Business Repository – just about every system nowadays has a “Document Management” plugin:  the Accounting System that stores invoices, the Human Resource Info System that houses employee documents, or the Contracts Management system in legal that maintains contracts.  These “niche” systems have created document sprawl within organizations, and a major headache for IT staff.
  • The SharePoint Library – The SharePoint phenomena hit pretty hard over the last 8 or so years, and most organization jumped on the train.  Although most organizations we see in the field did not truly standardize on SharePoint as their sole repository, many started using it for niche solutions, focused on departmental document needs.  Now, many years into their usage, they have massive content databases housed on expensive storage.
  • The New Kids on the Block – now enter the new kids: Alfresco, Dropbox, Box, OneDrive and Google Drive.  New is a relative term here, but organizations now have broad and extensive content on cloud-based, file-sync technologies.  Spanning personal and business accounts, these technologies have created new silos and management challenges for many organizations.

So, how can we leverage intelligent document capture and analytics to breakdown silos and make life easier?  Here are some core “silo breaking” uses:

  • Data Capture and Extraction – for projects where you want to “peer” into that document repository and extract and/or analyze the content, there are two solutions.  Intelligent capture applications, like Ephesoft Transact, can consume repository content, classify document types, and extract pertinent data.  Transact has a whole set of extraction technologies that can pull out valuable unstructured data:
    • Key Value Extraction – this method can parse document contents for information.  Take for example a repository of patient records where you want to glean patient name and date of birth.  This technology will look for patterns, and pull out required data.
    • Paragraph Extraction – lets say you want to find a specific paragraph, perhaps in a lease document, and then extract important information.   So you can easily identify paragraphs of interest across differing documents, and get what you need.
    • Cross-section Extraction – say you want to process 10 years of annual reports and pull off a specific bit of data from a table.  Say the liabilities number from the financial section.  You can specify the row and column header, and pluck just what you need.
    • Table Extraction – what if you want tables of data within a repository of documents.  Take for example lab results from a set of medical records.  You can extract the entire table and export it to a DB across thousands of reports.
Document Capture Extraction
Extracting Lab Data with Capture
  • Managed Migration from X to Y – we are seeing the desire to consolidate repositories and drive “scrubbed” content to a new, central repository.  Through advanced document capture, you can consume content from any of the above sources, reclassify, extract new metadata and confirm legacy data as you migrate to a new location.
  • Single Unified Capture Platform – providing a single, unified platform that can tie into all your existing repositories can save money, and add a layer of automation to older, legacy capture and scanning technology.  This repository “spanning” strategy provides a single path for documents which enhances reporting, provides powerful audit capabilities, and minimizes support costs and IT management burden.
  • Advanced Document Analytics (DA) – with the advent of document analytics platforms, like Ephesoft Insight, you can make that repository useful from a big data perspective through supervised machine learning.   These platforms take the integration of capture and analytics to the next level, and provide extensive near real-time processing.  DA is focused on processing large volumes of documents and extracting meaning, where there seems to be absolutely no structure.  So you can point, consume and analyze any repository for a wide variety of purposes.  You can read some great use cases for this technology here: Notes From the Field: What’s Hiding in Your Documents.
Document Capture and Analytics
Connecting the Dots with Document Analytics

Just a quick brain dump on breaking down silos with intelligent document capture and analytics.  Thoughts?  Did I miss anything?

 

 

Notes From The Field: What’s Hiding in Your Documents?

Document Capture OCR & Scanning

Advanced Capture for Document Mining, Extraction and Analytics

I wanted to write a post about some trends we are seeing within the market, mostly focused on leveraging intelligent document capture (Ephesoft) to mine existing document repositories.   So what constitutes a repository?  Well, it could be 100,000 scanned TIFFs in a network folder.   It could be a legacy document management system like Documentum housing terabytes of documents.    Or like many larger organizations, it could be a massive set of 10 separate repositories that span acquisitions, offices and countries.   With content growing exponentially, organizations are quickly realizing that this information can be a treasure trove, or it can be hiding something sinister that needs to be identified.

So what are the key use cases and industries?  Here are two below:

Financial Services – Anti-money Laundering (AML) – There have been a number of regulations passed that govern how financial institutions detect and report the flow of “dirty money” in and out of their institutions.  The Bank Secrecy Act has been around since the 1970’s, but has been amended with some key requirements through the Patriot Act, with a focus on terrorism and funding.  The onus is on financial firms to quickly identify, track and report suspicious transactions or face massive fines.  Much of this data is based in documents, and finding and extracting this critical information can be impossible without the right technology.  How do you tie new account ID information to another account opened and closed 3 years ago when all you have is a scan of a passport/ID and the original new account form in scanned PDF?  It gets more complex with trade-based money laundering, and there are several red flags that require evaluation of documents, such as:

  • Payments to a vendor by unrelated third parties
  • False reporting, such as commodity mis-classification, commodity over- or under-valuation
  • Repeated importation and exportation of the same high-value commodity, known as carousel transactions
  • Commodities being traded that do not match the business involved
  • Unusual shipping routes or transshipment points
  • Packaging inconsistent with the commodity or shipping method
  • Double-invoicing (list from ACAMS.org)
Document Analytics AML
An Example of Trade-based Money Laundering (Image from CTTS Office)

As you can imagine, you need all the components of an advanced capture and classification engine to identify key documents, extract core data, and place that information into an analytics engine for processing.

Healthcare – The Quest for a Cure – Imagine the value of being able to go back and consume 30 years of cancer patient lab reports.  Size of tumors, type of treatment, type of cancer, and all the metabolic information.  The challenge lies in the fact that the majority of patient records still exist in paper format, or at least those that were created prior to the rise of the Electronic Medical Record (EMR).   These labs are buried in a deep mess of the typical medical record.   What if you could process it all, automatically identify all the lab reports and pull out everything you need to map trends and results?   What if you could easily identify and extract the typical lab report table?

extracting lab report data capture
What if the paper lab report became data?

We have some customers today processing records for this very purpose.

 

The list can go on and on:

  • Oil and Gas Land Leases
  • Invoice Analysis for Identifying Trends
  • Claims Analysis for Fraud Identification

Anything I missed?  Thoughts?

 

Rise of the Machines: Machine Learning, Capture and Analytics

Document Automation and Machine Learning

Teaching Machines to Understand Documents

I remember when I first started out in the document capture and ECM world, I was sitting across from a CIO, presenting our technology, and he started asking pointed questions about configuration and services.  We talked for about 15 minutes, and he stopped, and I could see the gears were turning.  He looked at me and said: “Why do you guys make it so damn hard?”  I looked at him and said, “What do you mean?”  He responded with: “Why all the configuration and setup time?  Why cant it just understand my documents, and what I am trying to accomplish?  I know that current technology is capable.”  At that time, the trend in the industry was a heavy reliance on regular expressions, basically a pattern matching language that originated in 1956, born through mathematical theory.  So essentially, the CIO hit the nail on the head: We were using 1950s math theory to provide automation and value, but it came with a deep cost in the form of expertise and services.  So here we are 10 years later, and the majority of the industry still uses that same method to analyze, classify and extract data.

Rise of the Machines

In the document automation space, we typically present a magic world to the end users, one where they just hit the button, or upload their document, and stuff just happens “automagically”.  But in reality, behind the scenes, there was a lot of work to get to that point.  With the burden placed on IT in the form of education, configuration, service costs and testing.  Machine learning strives to eliminate that burden through simple efforts to train the system, and I think the goal, although lofty, is to reduce or eliminate configuration to a point that any user can create a workable system.

So, in document capture, what can machine learning provide?  In modern document automation technologies, like Ephesoft’s Capture Platform, machine learning can be leveraged in several ways:

  • Classifying Documents – If I had Ephesoft back in the day, I could have really made an impression on the CIO.  With Ephesoft’s training interface, I can take my different types of documents and train the system.  As I drag and drop new types of documents into the system, it “learns” all the nuances of the document.  It understands the structure, the words, their proximity, typeface and other information and uses that as key identifiers in the process.  For more on the extent of our Document Analytics/Analysis engine, see this post: Document Analysis, Analytics and Capture.
  • Intelligence & Confidence – Just like people, good machines know when to ask questions or admit when they are wrong.  In a machine learning environment, having a mechanism to ask questions is key.  In document capture, this comes in the form of an established confidence level, and voting algorithm that can call attention to documents or data in question.  When these questions are answered, the machine gets smarter, and learns.
OCR confidence levels
Confidence Flagging Aids in Understanding
  • Gathering Information – just as we learned through experience growing up, the machine needs to learn from every interaction.  Any form of human input needs to add to understanding, and overall document intelligence.  Click on a missed piece of data, and now the system knows its location, and its format.  It also knows the proximity of other words, and has an enhanced understanding of new dimensions of that document.

These are just a few examples of machine learning, and what it brings to the document capture industry.  More to come when we release our new version, 4.1 at our conference next week.

Resuscitate Your Capture: Bringing New Life to Document Automation

Document Capture Automation

Adding The Next Generation of Document Capture Automation

Over the past decade, the document capture industry has become quite stagnant and ripe for disruption.   The acquisition of just about every capture company by larger, behemoth organizations has created a stagnation in innovation and a lack of modernization.   IT executives are yearning for a refresh to their legacy capture solutions, and they expect standards of the modern tech world:

  • Service/Platform based architecture
  • Web/browser-based user interfaces
  • Web services APIs
  • Cloud-enabled technologies

With that said, many organizations have made massive investments in document capture technology, and a “rip and replace” strategy comes with a serious impact to business operations.   But there can be exponential benefits to a modernization of document automation and capture technologies.  This comes from key new developments from innovative capture startups:

  • Machine Learning – in the legacy capture world, long expensive services engagements are the norm, with deep custom development and configuration.  Isn’t it 2016?  Aren’t computers supposed to take that pain away with intelligence?  In steps machine learning.  The modern capture platform provides a core learning engine that understands your documents, their layouts and data.  As you use the system, it gets smarter, improving accuracy and reducing user intervention, with a true end goal of autonomous operations.
  • Capture Web Services – providing capture functionality to any application in the organization can be a huge boost to efficiency and productivity.  Want a customer document upload page to validate the uploaded documents are of the correct type?  Need check the date of a document, or that it has been signed?  Document capture services can give your development teams a tool set they have never had in the past.
  • Document Analytics and Analysis – taking a holistic view of the whole document capture process is essential to the modern capture platform.  Seeing the document as pure words will not further understanding, nor provide additional benefit.  With a true Analytics/Analysis frame of mind, every single characteristic of the document becomes important: font, font size, location, surrounding words and overall layout (for a deep look at the facets of document analytics, see my previous post: Document Analytics and Capture ).
  • Open Architecture – Having a capture platform that has been built from the ground up with openness and extensibility  in my mind is absolutely critical.  Adding this as an afterthought creates a clunky difficult environment for developers, and leads to workarounds and lack of desired functionality.

The great benefit here, is that without a “rip and replace” event, modern capture platforms can be added as a non-disruptive, transparent automation and efficiency layer.

Capture Epi-center Solution
Modern Capture Adding Efficiency and Automation At the Epicenter

By adding a centralized capture engine, you can glean the following benefits:

  • Any scanning device becomes an input device
  • BPM and Workflow systems can take advantage of capture with minimal dev (See an example here: Notes From the Field)
  • Services like fax and email can easily be designated as a source for capture
  • Legacy capture processes with bar code sheets and manual data entry can be automated
  • Mobile devices can now leverage mobile capture SDKs and the centralized automation engine
  • Legacy ECM systems now have a new automation dimension
  • Cloud-based enterprise services can be capture-enabled

So a tech refresh on the capture front becomes a viable initial project, and current capture components can be left in place.  In this case, Ephesoft becomes a new layer of automation and a catalyst for process improvement and efficiency.

This has been a consistent theme in our experience out in the market, with existing legacy capture customers and new prospects looking for a minimal impact refresh for their ailing and aged capture infrastructures.  Thoughts?  Comments?