Multi Language OCR

PaperNIC allows you to extract text from image in any of the 40+ languages that we support. The supported engines are able to intelligentlly detect the language on the image and return the results as a text string.

OCR technology makes a heavy use of language information and dictionaries to achieve high recognition quality during the process of optical character recognition. Real documents can contain multiple languages on one page or the document stream contains a large number of different languages.

The technology selected the best matching language from a group pre-defined group of languages, this group can/has to be set/edited by the user/developer.

If the language input is very mixed and consist of a lot of different languages, then manual pre-sorting is often not an option. Instead multiple OCR runs with different language settings have to be made. Based on the internal recognition statistics the system had to decide what combination delivered the best results. 

The recognition language of a document can be automatically detected, but the user has to specify at least one language that might show up in the document.

Related Articles