OCR = PDF to Text & Image to Text
Optical Character Recognition, or OCR, is a process by which an image or PDF is converted to text. It is heavily used in the scanning of documents and is a core component of document capture software. There several types of OCR, including:
- Full page OCR – converts the entire scanned page to text.
- Zone OCR – converts only a small portion, or zone, of a document to text.
- OCR Separation – technique for splitting documents based on text
Most OCR engines provide a wide variety of output formats, including: text, Microsoft Word, Microsoft Excel and Adobe PDF. PDF conversion to text is the most common output as it stores both the image and text in a single container. OCR Software is usually bundled and sold with document scanners, and can be used in simple form as a desktop application.