by jking Wed Sep 13, 2017 7:42 pm
Names are a bit trickier to pattern out in that names do not always match a standard pattern.
A first name may be one character string or multiple character strings. For instance, a first name could be Jim or Jon or Tom or Nancy, Ann, Beth, etc.
A first name could be Mary Elizabeth, Billy Bob, John Michael, BJ, KC, etc.
In a similar fashion, last names can be one continuous string, hyphenated, separated by a space or contain post-nominal initials (III, IV, JR, SR, MD, PhD, etc.).
Using Jim King as the name example:
The simplest pattern to extract everything up to the 1st space and ignore everything else is: (.*)\s.* results is Jim
The simplest pattern to extract the last name is .*\s result is King
However, if the name is more complex, for example ROBERT B PARRAMORE-SMITH IV, then
(.*)\s.* results is ROBERT B PARRAMORE
.*\s result is IV
Those would not be the results you would want.
A more complex pattern would return the correct last name:
^([A-z][A-Za-z]*\s+[A-Za-z]*)|([A-z][A-Za-z]*)$
Replace Expression:1 ROBERT B PARRAMORE-SMITH
Replace Expression:2 PARRAMORE-SMITH IV
Replace Expression:3 PARRAMORE-SMITH
nb: the Replace Expression: 3 is key to this pattern working correctly!
A google search on Regular Expressions should result in several resources what include patterns for First Name and Last Name.
Good luck with this. Since names are inherently unstructured this is more difficult to do with your robot.
EDIT:
The is also the issue of pre-nominal titles: Dr, Fr, Sr, Mr, Mrs, Ms, etc., etc., etc. These add additional levels of complexity.