Just looking for code? https://github.com/albertherd/ChequeAnalyser
Microsoft Cognitive Services is a rich set of AI services, such as Computer Vision, Speech Recognition, Decision making and NLP. The great thing about these tools is that you don’t really have to be an AI expert to make use of these tools, as these models come pre-trained and production ready. You’ll just feed it your information and let the framework work for you.
We’ll be looking at one area of Microsoft’s Cognitive Services – Computer vision. More specifically, we’ll be looking at the handwriting API – you’ll provide the handwriting and the system will provide you with the actual text. We’ve already worked with the Computer Vision API from Microsoft Cognitive services – we used this API to tag our photo album.
Let’s look at today’s scenario – we’re a fictitious bank which processes bank cheques. These cheques come hand-written from our clients, which contain instructions on how to transfer money from one account to the holder’s account.
A cheque typically has the following information:
- Issue Date
- Amount (in digits)
- Amount (in words, for cross reference)
- Payer’s account number
- (Other information, which was omitted for this proof of concept
This is how our fictitious cheque looks like.
This is how our fictitious cheque looks when we’re looking at the regions we’re interested in, represented in bounding boxes.
Let’s consider these three handwritten cheques.
The attached application does the following analysis:
- Import these cheques as images.
- Send the images over to Microsoft Cognitive Services
- Extract all the handwriting / text found in the image
- Consider only those text which we’re interested in (as represented with bounding boxes previously)
- Forward this extracted information to whatever system needed. In our case, we’re just printing them to screen.
The below is the resultant information derived from the sample cheques.
Most of the heavy lifting is done by the Microsoft Cognitive Services, making these AI tools available to the masses. Of course, with a bit more business logic, the information that can be extracted from these tools can be greatly improved, making them production ready.
As with the previous example, this example uses the TPL Dataflow library, which is an excellent tool for Actor-Based multithreaded applications.
If you want to try this yourself, you’ll need:
- Download the code – https://github.com/albertherd/ChequeAnalyser
- Get a Microsoft Cognitive Services Subscription – https://azure.microsoft.com/en-us/services/cognitive-services/computer-vision/#documentation
- Set up your Microsoft Cognitive Service Subscription and get your API Endpoint and your API Keys
- Adjust the Constants.cs file with the obtained keys
- Run the application
Until the next one!