John Nagle wrote:

   The parser at PyParsing:

     http://pyparsing.wikispaces.com/file/view/streetAddressParser.py

..Bad cases...
487 E. Middlefield Rd.  -> streetnumber = 487, streetname = E. MIDDLEFIELD
487 East Middlefield Road -> streetnumber = 487, streetname = EAST MIDDLEFIELD
226 West Wayne Street -> streetnumber = 226, streetname = WEST WAYNE
New Orchard Road -> streetnumber = , streetname = NEW
1 New Orchard Road -> streetnumber = 1 , streetname = NEW
390 Park Avenue -> streetnumber =, streetname = 390


  Here's a system that gets all the above cases right: the USC Deterministic
Address Parser.

https://webgis.usc.edu/Services/AddressNormalization/Interactive/DeterministicNormalization.aspx

This will parse a street address line alone, without a city, state, or ZIP code,
so it's not using a big database.  There's a technical paper

http://gislab.usc.edu/i/publications/gislabtr11.pdf

but it doesn't have that much detail.  However, now we know a solution
exists.  I've asked USC if they'll make the code available.

                                        John Nagle
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to