KNOWLEDGE BASE ARTICLE

Quick Guide to Regular Expressions in Umango Extract

Regular Expressions 'RegEx' are a fast, powerful and accurate way to be able to identify exactly the text you want to extract from an area of a document. For those with some technical know-how the steps in creating Regular Expressions can be simple, for others it may take a little more time.

The internet is full of website to assist in building and validating Regular Expression. Some examples that you may wish to utilize include;

Here are a few examples of Regular Expressions to help get you started.

Hints:

Fig 1. Test your results using Tesseract and ABBYY OCR engines can give you different results. ABBYY set to Accurate tends to provide the best result and doesn't slow the extraction down (compared to Fast). When using Tesseract leaving the setting on Fast tends to provide the best results vs. speed.

Fig 2. Remember Format and Validation provides the rules around how data should be structured once it has been captured.

Fig 3. Smart Seek is the rules we use to capture data we want to locate and capture.

Use 'Test' in Umango Extract to check your settings before saving your job.

Fig 1.

Fig 2.

Fig 3.

Example Regular Expressions using Umango Extract.

Objective	Regular Expression for Format and Validation	Regular Expression for Smart Seek	Image
A 6 to 7 digit number	Reg Ex: REGEX(\d{6,7})
A number after the word BALANCE	REGEX(\d{1,5}.\d{2})	REGEX((?<=BALANCE.*)\d{1,5}.\d{2})
Date below looking for NN/NN/NNNN	REGEX(\d{1,2}/\d{1,2}/\d{2,4})	REGEX(\d{1,2}/\d{1,2}/\d{2,4})
Looking for a number NN NNN NNN NNN	REGEX(\d{2}\s\d{3}\s\d{3}\s*\d{3})
Looking for a string of data after the word Name:		RegEx((?<=Name:\s).*\n)
Looking for the dollar amount	REGEX(\$[0-9]{1,5}.[0-9]{2})	REGEX(\$[0-9]{1,5}.[0-9]{2})
Extracting the account number	REGEX([0-9]{3}\s?[0-9]{3}\s?[0-9]{3})	REGEX([0-9]{3}\s?[0-9]{3}\s?[0-9]{3})

Link to this article https://umango.com/KB?article=85

KNOWLEDGE BASE ARTICLE

Quick Guide to Regular Expressions in Umango Extract

Example Regular Expressions using Umango Extract.

Related Tags