Can we build a global handwriting recognition engine?

Sukant Khurana

6 years ago

Original post:

https://medium.com/@sukantkhurana/can-we-build-a-global-handwriting-recognition-engine-78372b7ad10f

Can we build a global handwriting recognition engine?

Global Character Recognition Engine: A Machine Intelligence approach

Let us start by asking, “What is OCR and what is the fuss about it?”

With some recent enviable breakthroughs in the field, you might have heard of OCR. Optical Character Recognition (OCR) has been a topic of interest for many years. It is defined as the process of digitizing a document image into its constituent characters. Imagine a system that can decipher a prescription of doctor in bad handwriting, as above, in the lower of the two pictures. Despite decades of intense research, developing OCR with capabilities comparable to that of human still remains an open challenge.

When you fill a CAPTCHA, you know it works on your ability to outdo a task that has been considered to be out of the reach of computers. If you think CAPTCHA would remain tough to crack for few years, let us tell you, you could not be more wrong.

Over the last few years, the number of academic laboratories and companies involved in research on Character Recognition have increased dramatically. OCR is a complex problem because of the variety of languages, fonts, and styles in which text can be written, and not to mention the complex rules of languages. Hence, techniques from different disciplines of computer science i.e. image processing, pattern classification and natural language processing etc. are employed to address different challenges. Before we get on with challenges in the field and what Khurana group is up to, let us briefly talk of applications of OCR.

OCR enables a large number of useful applications online and offline, such as:

· Data entry for business documents. Imagine chemists/drug-stores being able to scan doctors’ handwriting without any struggle.

· Automatic number plate recognition. You know how easy this would make traffic-rule enforcement.

· Automatic insurance documents being able to extract key information without human intervention.

· Extracting business card information into a contact list.

· Book scanning

· Make electronic images of printed documents search-able, for example Google Books

· Converting handwriting in real time to control a computer (pen computing).

· Assistive technology for blind and visually impaired users. Imagine all the road signs being available to blind from a distance.

The applications are practically in all walks of life. You just have to think where you encounter written text and you would find an application.

We hope with a list of its applications we have made you interested in OCR and ready to dive in how to go about building OCR.

Before we get into how people have traditionally worked on OCR, let us first build up some suspense. We would first discuss conceptually what questions are there. So for you, here are the major phases of OCR:

· Image acquisition: This is the capture of the image from an external source, like a scanner or a camera.

· Pre-processing: Once the image has been acquired, different pre-processing steps are usually performed to improve the quality of image. The different pre-processing techniques are noise removal, thresholding, and extraction image baseline etc.

· Character segmentation: In this step, the characters in the image are separated such that they can be passed to a recognition engine. These involve use of techniques such as connected component analysis and projection profiles.

· Feature extraction: The segmented characters are then processes to extract different easily computable features. Based on these features, the characters are recognized.

· Character classification: In these features of segmented image are placed in different categories or classes. Structural classification techniques are based on features extracted from the structure of image and uses different decision rules to classify characters. Statistical pattern classification methods are based on probabilistic models.

· Post processing: After classification, the results are not 100% correct, especially for complex languages. Post processing techniques can be performed to improve the accuracy of OCR systems. These techniques utilizes natural language processing, geometric and linguistic context to correct errors in OCR results.

While people have broken OCR conceptually into these phases, it is not always the case that various algorithms also do it in precisely this manner Let us briefly look at the history of OCR research before we move forward.

Earlier Approaches:

The problem of text recognition has been attempted by many different approaches. There are several ways to skin a cat and since OCR has big money, people have tried many.

Template matching is one of the simplest and oldest approaches. In this many templates of each word are maintained for an input image, error or difference with each template is computed. The symbol corresponding to minimum error is output. The technique works effectively for recognition of standard fonts, but gives poor performance with hand written characters.

Matrix matching involves comparing an image to a stored glyph on a pixel-by-pixel basis; it is also known as “pattern matching”, “pattern recognition”, or “image correlation”. This relies on the input glyph being correctly isolated from the rest of the image, and on the stored glyph being in a similar font and at the same scale. This technique works best with typewritten text and does not work well when new fonts are encountered. This is the technique the early physical photocell-based OCR implemented.

Feature extraction is another approach, in which statistical distribution of points is analyzed and orthogonal properties extracted. For each symbol a feature vector is calculated and stored in database. And recognition is done by finding distance of feature vector of input image to that of stored in the database, and outputting the symbol with minimum deviation.

In geometric approach, on the other hand, features depend on physical properties, such as number of joints, relative positions, number of end points, length to width ratio etc. Classes forced on basis of these geometric features are quite distinct with not much overlapping. The main drawback of this approach however is that this approach depends heavily on the character set, Hindi alphabets or English alphabets or Numbers etc. Features extracted for one set are very unlikely to work for other character set. These factors affect the recognition rate and many other performance measures in recognition of characters.

The current handwritten character recognition systems are dealing with following issues:

· Retrieval of important information

· Removal of unnecessary information

· Application of different recognition algorithms in each case.

· Eliminating difficulty caused by the large variations of individual writing style.

Let us now discuss where the field is likely to go. The focus may be given on optimizing the recognition algorithm.

We believe that the segmentation of characters can be improved to include overlapping and joined characters. New algorithms can be developed to help the segmentation of joined hand written letters. We need to incorporate the variability in someone’s calligraphy over time. We need to look at the similarity of some character with each other and how wrong identification can be avoided. We need to look at the large variety of character shapes, especially more so in some languages than others. We also cannot ignore the large variety of writing styles produced by different writers.

While it is important to identify the challenges but it is also important to address how it can be done?

Here are some of the common approaches people are trying.

Artificial Intelligence approach:

Artificial Neural Network (ANN) consists of two basic kinds of elements, neurons and connections. Neurons connect with each other through connections to form a network. This is like an (over)simplified model of the human brain. Neural networks offer complete independence of recognition process and character set. Giving machine the power to see, interpret and the ability to read text is one of the major tasks of Artificial Intelligence. Nowdays people are using more advanced neural networks than ANNs for OCR.

Particle Swarm Optimization

PSO is a population-based searching method which makes a few or no assumptions about the problem but will search very large spaces of the solution.

Support Vector Machines

In machine learning, support vector machines are supervised learning models with associated learning algorithms that analyze data used for classification and regression analysis. The principle of an SVM is to map the input data onto a higher dimensional feature space non-linearly related to the input space and determine a separating hyper-plane with maximum margin between the two classes in the feature space.

Genetic Algorithm

A GA’s are an optimization and search method utilized in computer science to find fairly accurate solutions to problems. It is inspired by processes in biological evolution such as natural selection, inheritance, recombination and mutation. GAs is generally realized in a computer model, in which a population of runner solutions to an optimization problem progress to better solutions.

Let us look at what is happening in the cutting edge world of OCR.

AI beats humans at reading comprehension:

Chinese retail giant Alibaba has developed an artificial intelligence model that’s managed to outdo human participants in a reading and comprehension test designed by Stanford University. The model scored 82.44, whereas humans recorded a score of 82.304. Imagine with so many characters in Chinese language, if OCR could outdo others, what it can do for language of your interest.

Anyline GitHub, Inc.

Anyline, the Austrian start-up that provides mobile OCR tech to enable developers to add text recognition to their own apps, has raised €1.5 million in funding. The list of investors includes angel investor Johann ‘Hansi’ Hansmann, busuu co-founder Bernhard Niesner, Lukas Püspök, and the U.S.-based VC-fund iSeed Ventures.

Offering its own mobile Optical Character Recognition (OCR) technology — which uses a Smartphone’s camera to accurately scan and recognize any kind of text, code or number.

Applications for Anyline’s SDK including adding barcode or passport scanning to an app or things like scanning electricity meter values, serial numbers and “other process enhancing information”. The start-up says it plans to bring its OCR tech to smart glasses and will launch an Augmented Reality (AR) solution later this year. Current partnerships include a plug-in for the AR technology of wikitude, a ready-to-download SDK for Epson Moverio Pro smart glasses and a distribution partnership with Konica Minolta.

ABBYY

Text grabber for android now captures text in real time and makes it actionable, Milpitas, United States (December 15, 2017)

ABBYY®, a global provider of intelligent capture solutions to improve business outcomes, announced the release of TextGrabber 2.0 for Android with Real-Time Recognition and a completely redesigned interface. Now TextGrabber for both iOS and Android transforms text within a camera viewer into digital data in real time, to help users capture, share, translate, and use text, links, phone numbers, addresses, promo codes, and other printed information on the go. This new capability works in both online and offline modes.

With TextGrabber 2.0, Android users can lift printed text of any color from any background in live video stream, directly on the camera preview screen of a mobile device, without the need to take a photo or crop it. Recognition is performed locally on the device, no Internet connection is needed. The technology works with 61 languages, the biggest number on the market in its category.

Digitized text instantly becomes actionable: it can be copied, edited, shared, translated into 104 languages or voiced. Links, phone numbers, email addresses, and street addresses become clickable connecting the user to a corresponding task: follow, call, email or find on maps. All the digitized texts are saved in the app easily accessible for further use. The translation feature currently requires an Internet connection. Offline translation is scheduled to come to TextGrabber as part of future updates.

TextGrabber is essential for travel, business, and school as it quickly and accurately captures printed text in a foreign language and translates it into the user’s language of choice. The app also serves the needs of people with low vision who will be able to hear virtually any text from print, computer or TV screen with minimum effort or delay.

Companies and Money involved in OCR

Presence of numerous participants makes the global market for optical character recognition (OCR) highly fragmented and competitive in nature.

Some of the key players operating in the global market for optical character recognition are Anyline GmbH, ABBY Software Ltd., Adobe Systems Incorporated, ATAPY Software, CCi Intelligence Co. Ltd., Creaceed S.P.R.L., CVSION Technologies Inc., Exper-OCR Inc., Google Inc., LEAD Technologies Inc., I.R.I.S.S.A. (Canon), IBM Corporation, Microsoft Corporation, Nuance Communications Inc., NTT Data Corporation, Paradatec, Inc., Prime Recognition Corporation, Ripcord Inc., Transym Computer Services Ltd., Black Ice Software LLC, SEAL Systems, Ricoh Group, and Accusoft Corporation.

Summary

There is no accurate recognition system with nearly 100% accuracy for all languages and anyone who can build one can make a lot of money. Looking at the global market where in most of the countries, scripts have some common peculiar features are used and we believe it is possible to develop a system which can recognize all scripts of the world.

References:

1. “Global Optical Character Recognition (OCR) Market: OCR Revolutionizes Document Management Process for Different Businesses”, Press release by Transparency Market Research (TMR) Posted on Nov 21, 2017

2. “A Survey on Optical Character Recognition System” by Noman Islam, Zeeshan Islam, Nazia Noor Computer Vision and Pattern Recognition, Cornell University Library, October 3,2017.

3. “How to identify Asian, African, and Middle Eastern alphabets at a glance”, Article ‘The Week’ by James Harbeck, May 20, 2016.

4. “Hand Written Devnagari Character Recognition”, ResearchGate Article, January 2007 by Swamy Saran Atul, Swapneel Prasanth Mishra.

5. “AI Beats Humans at Reading Comprehension, but It Still Doesn’t Truly Comprehend Language”, by Jamie Condliffe , MIT Technology Review, January 15, 2018.

About:

Mr. Prashant Manoharrao Kakde lives in Yavatmal, Maharashtra and is a prolific researcher in the fields of artificial Intelligence, digital image processing, digital instrumentation, and biomedical engineering. He is currently an assistant professor at H.V.P.M’S C.O.E.T, Amravati. He is working under the guidance of Dr. Khurana in making a OCR which can recognize all world’s languages with high accuracy.

Dr. Sukant Khurana runs an academic research lab and several tech companies. He is also a known artist, author, and speaker. You can learn more about Sukant at www.brainnart.com or www.dataisnotjustdata.com and if you wish to work on biomedical research, neuroscience, sustainable development, artificial intelligence or data science projects for public good, you can contact him at skgroup.iiserk@gmail.com or by reaching out to him on linkedin https://www.linkedin.com/in/sukant-khurana-755a2343/.