GDPR and Privacy

GDPR and Documents: Defining a Strategy

Posted on September 6, 2017 by sboals

A Document-centric Strategy for GDPR Compliance

With the effective date for the new General Data Protection Regulation (GDPR) fast approaching, now is the time to put in a solid strategy when it comes to documents and images. Organizations not only need to implement process and procedure for handling private information, but also need a firm evaluation of “current state” to understand high risk areas of their business and their understand their exposure. Below are the four key steps, as outlined by Microsoft’s GDPR Strategy, and how you can incorporate a document-centric view within your plan:

Discover

Discovery will probably be the most challenging step when it comes to documents and GDPR. When it comes to the enterprise, the vast majority have a large number of document repositories. Just think of the modern workplace, and all the locations where documents reside:

Network folders
Local folders
Sync technologies like Box, OneDrive, Dropbox, Google Drive
Corporate Enterprise Content Management (ECM) and Document Management (DM) systems
Line of Business systems that house documents
Email & attachments

The ability to crawl and identify high risk entities within these locations is critical for compliance. Here is a checklist of required functionality when in comes to a technical solution:

Two-phase Identification – most of the technologies on the market just use pattern matching to identify personal information within documents. This can be problematic, and burden staff with false-positives, and require immense time requirements to validate. With two-phase identification systems (like Ephesoft), documents are first classified as a certain type: agreement, application, correspondence, etc. This classification can be configured for an organization’s specific document requirements, and can immdeiately ID a document as high risk. The second phase of risk identification is pattern matching, fuzzy DB correlation and key value searching. This two-phase approach is absolutely required for accuracy and high confidence.
Optical Character Recognition (OCR) – images can be a very high risk type of document. In order to properly evaluate an image for risk, there needs to be a text conversion process. It goes much further than that, the application also needs a voting and confidence engine. Images vary in quality, and a fax or “copy of a copy” can be problematic. With a confidence flag on both the overall document and identified private information, images can be graded on overall quality, and quality of data.
Open Architecture – proprietary systems cannot meet all the requirements that will be necessary for GDPR Discovery, and most organizations will need ulitmate flexibility to modify and customize software for their unique needs and requirements. Using modular and open platforms will guarantee the best solution and fit for your needs.
Machine Learning – using a system that gets smarter with each day of use is required in today’s modern world. A GDPR Machine Learning system can learn new high risk documents, and evolve as an organization changes.

Manage

Once a GDPR document inventory is complete, and an organization understands their areas of document risk and exposure, a plan can be put in place to manage and govern the assets of their data subjects. This phase or step within your GDPR document strategy can include the following:

Migrating high risk documents to a managed repository – if high risk documents exist outside of a governed and managed repository, the same tool that can help in discovery can also help with migration. As documents are classified, metadata can also be extracted, and the document moved into a new or existing system of record. You can see an example of contract migration to SharePoint Online here: Migrating Contracts and Data to SharePoint.
Implementing an intelligent document transport layer – creating a repeatable, standardized process for document ingestion and processing can flag new documents as they enter an organization’s digital realm. This insures proper governance, and placement of high risk assets.

Protect

In the protection step, organizations need to put security controls on all documents deemed as high risk. But the protection step also requires thought on future documents, and protecting new private assets. As outlined in “Manage”, an effective document transport technology will identify and route newly ingested documents to a protected resting place. Organizations also need to implement real-time controls for high risk identification and classification. Here are some examples:

Constantly discover – you can protect those documents that are in your managed repository, but what about newly generated personal data? As new policies and procedures are implemented, organizations need to use their discovery technology to constantly monitor and find new high risk entities.
Embed classification technology – enabling detection in your everyday applications can reduce risk, and insure compliance. Modern classification platforms have web services enabled in cloud and on premise solutions to help. You can see an example here: Real-time GDPR Scanning and Detection in SharePoint

Report

The new GDPR standard is all about accurate record keeping, which provides transparency and overall accountability. Knowing all the document types that can be classified as having personal information, and the processes around them, are critical to insure compliance. An audit of policies and procedures is sure to require records of document creation, or ingestion, how it was handled, and where it was ultimately placed under management. All of the technologies mentioned in this article have broad reporting and analytics capabilities.

With the complexities of GDPR, standard reporting wont suffice in most cases, and the ability to perform deep analytics to track and identify key data and documents will be a requirement.

Just a quick post on strategy for GDPR when it comes to the unstructured content that lives within documents. Let me know your thoughts on the topic.

GDPR and Compliance: Are Documents the Enterprise Minefield?

Posted on February 25, 2017 by sboals

What is your privacy strategy for documents and content repositories?

The new General Data Protection Regulation (GDPR) is set to replace the older Data Protection Directive in the EU on May 25, 2018. This new roll out of privacy protections for EU nations has broad and expansive implications for any company within the realm of the EU, or those that process EU citizen information and data. Here is a summary of the major changes:

GDPR jurisdiction now applies to all organizations that process EU subject personal data, regardless of the
organizations location.
Breach of GDPR can be fined up to 4% of global turnover or 20M Euros (whichever is larger)
Consent when providing personal information must be clear and easy to understand.

There are a set of core subject rights that apply, and below is a quick summary:

Breach Notification – any data breach requires notification within 72 hours.
Right to Access – subjects can request an electronic copy of all private data at any time.
Right to be Forgotten – aka Data Erasure, a subject at any time can request to have all private data removed from a controlling organizations systems.
Data Portability – subjects can request to have their information transferred to another organization at any time. This will go hand in hand with the “right to be forgotten”.
Privacy by Design – now a legal requirement, organizations must show proof of “…appropriate technical and organizational measures…” within any system or process.
Data Protection Officers (DPOs) – organizations will now require DPOs. This individual will be responsible for interfacing with EU nations and authorities, and will carry the heavy burden of responsibility for all data protection efforts.

So, with that quick outline, imagine the implications of millions of application documents with personal information that are breached. What about the accidental scan of medical records to an insecure document sync folder? Or the directory of millions of scanned documents that have a few documents with private information?

Organizations need a two-pronged approach to prevent the document minefield. So, to get this under control, and mitigate risk, there are really two types of technologies that need to work hand in hand.

GDPR Compliance Solution — GDPR Document Strategy: Transactional and Analytical

First, a document and content capture technology that works as an ingestion point for new content and existing document-centric processes. This form of enterprise input management can be placed as an non-invasive automation layer to flag/identify suspect content and provide reporting capabilities around private information for compliance. Once again, focused on day forward transactions.

Second, is a solution to crawl existing repositories to classify, extract and identify documents that pose a risk. This technology can work hand in hand with the transactional layer to build machine learning profiles, and establish analytical libraries of document and data profiles so the analytical side can become proactive and preemptive. This can be a critical step in identifying possible legacy documents that house private information that could be subject to GDPR fines.

So, where does Ephesoft fit? We have two products that span the transactional and analytical requirements to help organizations capture, classify, identify and visualize their documents in a broad sense, and comply with GDPR privacy rules.

GDPR Compliance for Documents — Ephesoft Provides GDPR and Privacy Solutions for Documents

For the day-to-day, we have Ephesoft Transact, and for deep analytics, we have Ephesoft Insight. If you need further information, you can contact us here: Ephesoft GDPR Solution Information.