Capture involves converting information from paper documents into an electronic format through inbound faxing, email or scanning. Capture is also used to collect electronic files and information into a consistent structure for management. Capture technologies also encompass the creation of metadata (index values) that describe characteristics of a document for easy location through search technology. For example, a medical chart might include the patient ID, patient name, date of visit, and procedure as index values to make it easy for medical personnel to locate the chart.
Various recognition technologies can be used to extract information from scanned documents and digital faxes, including:
- Optical character recognition (OCR)
Converts images of typeset text into alphanumeric characters
- Handprint character recognition (HCR)
Converts images of handwritten text into alphanumerics. Gives better results for short text in fixed locations than for freeform text.
- Intelligent character recognition (ICR)
Extends OCR and HCR to use comparison, logical connections, and checks against reference lists and existing master data to improve recognition. For example, on a form where a column of numbers is added up, the accuracy of the recognition can be checked by adding the recognized numbers and comparing them to the sum written on the original form.
- Optical Mark Recognition (OMR)
Reads special markings, such as checkmarks or dots, in predefined fields.
- Barcode recognition
Decodes industry-standard encodings of product and other commercial data.
- Forms processing
In forms capture, there are two groups of technologies, although the information content and character of the documents may be identical. Forms processing is the capture of printed forms via scanning; recognition technologies are often used here, since well-designed forms enable largely automatic processing. Automatic processing can be used to capture electronic forms, such as those submitted via web pages, as long as the layout, structure, logic, and contents are known to the capture system.