Optical Character Recognition

OCR = PDF to Text & Image to Text

Optical Character Recognition, or OCR, is a process by which an image or PDF is converted to text.  It is heavily used in the scanning of documents and is a core component of document capture software.  There several types of OCR, including:

  • Full page OCR – converts the entire scanned page to text.
  • Zone OCR – converts only a small portion, or zone, of a document to text.
  • OCR Separation – technique for splitting documents based on text

Most OCR engines provide a wide variety of output formats, including: text, Microsoft Word, Microsoft Excel and Adobe PDF.  PDF conversion to text is the most common output as it stores both the image and text in a single container.  OCR Software is usually bundled and sold with document scanners, and can be used in simple form as a desktop application.