On Jan 28, 8:27 am, Lie Ryan <lie.1...@gmail.com> wrote: > On 01/28/10 11:28, Brian D wrote: > > > > > I've tackled this kind of problem before by looping through a patterns > > dictionary, but there must be a smarter approach. > > > Two addresses. Note that the first has incorrectly transposed the > > direction and street name. The second has an extra space in it before > > the street type. Clearly done by someone who didn't know how to > > concatenate properly -- or didn't care. > > > 1000 RAMPART S ST > > > 100 JOHN CHURCHILL CHASE ST > > > I want to parse the elements into an array of values that can be > > inserted into new database fields. > > > Anyone who loves solving these kinds of puzzles care to relieve my > > frazzled brain? > > > The pattern I'm using doesn't keep the "CHASE" with the "JOHN > > CHURCHILL": > > How does the following perform? > > pat = > re.compile(r'(?P<streetnum>\d+)\s+(?P<streetname>[A-Z\s]+)\s+(?P<streetdir>N|S|W|E|)\s+(?P<streettype>ST|RD|AVE?|)$') > > or more legibly: > > pat = re.compile( > r''' > (?P<streetnum> \d+ ) #M series of digits > \s+ > (?P<streetname> [A-Z\s]+ ) #M one-or-more word > \s+ > (?P<streetdir> S?E|SW?|N?W|NE?| ) #O direction or nothing > \s+ > (?P<streettype> ST|RD|AVE? ) #M street type > $ #M END > ''', re.VERBOSE)
Is that all? That little empty space after the "|" OR metacharacter? Wow. As a test, to create a failure, if I remove that last "|" metacharacter from the "N|S|W|E|" string (i.e., "N|S|W|E"), the match fails on addresses that do not have that malformed direction after the street name (e.g., '45 JOHN CHURCHILL CHASE ST') Very clever. I don't think I've ever seen documentation showing that little trick. Thanks for enlightening me! -- http://mail.python.org/mailman/listinfo/python-list