OCR or Optical Character Recognition is a character recognition technology that allows characters and words on a document to be read and converted into an editable and searchable format. A basic example would be scanning a newspaper using a document scanner, and converting the image into a word document. The end result is a few pages of text that can immediately be edited or converted into another format(XML for example).
Imagine you have a paper based text document in front of you. Lets says its 20 pages of printed text and you really want to add this document to your website. The problem is that you don’t have any other versions of this available. You’ve got 2 options – manually type it all in to Word which could take hours – or you scan it in and let the OCR software convert it into a Word document for you. For most people the second option is obviously the most attractive by far.
Having established OCR as a time saving mechanism in data extraction, the next question is when and how to use it. Needless to say it isn’t suitable nor applicable to every scanned image. At ORS Group some of our customers literally want to scan and archive their documents for retention purposes only. In this situation there is no need to OCR the documents and if it was required at a later date we could easily just apply it.
However, a significant number of our customers want to work with their documents and often these documents are in daily use. OCR benefits these customers in a number of ways. Firstly, once a document or image has been OCR’d it can be made searchable. For many people this is an essential part of their digital documentation library. The ability to search for a word, perhaps a policy number or even a unique index is vital when used in conjunction with document management software.
The second significant benefit is the ability to convert the scanned images into editable documents. Many of our customers wish to convert their paper archives into digital data and make this data available online. OCR allows them to convert the scanned images into HTML or XML and after a little quality control the documents are ready to be made available on the internet. The benefits of doing this include freeing up storage or shelf space, the data becomes freely available so more people can make use of it and the data can even be sold once its online. One example of this would be making historical law journals available to purchase online where previously they had been hidden away and only available to someone physically visiting the library they were stored in.