John Nagle wrote:
The parser at PyParsing:
http://pyparsing.wikispaces.com/file/view/streetAddressParser.py
..Bad cases...
487 E. Middlefield Rd. -> streetnumber = 487, streetname = E. MIDDLEFIELD
487 East Middlefield Road -> streetnumber = 487, streetname = EAST MIDDLEFIELD
226 West Wayne Street -> streetnumber = 226, streetname = WEST WAYNE
New Orchard Road -> streetnumber = , streetname = NEW
1 New Orchard Road -> streetnumber = 1 , streetname = NEW
390 Park Avenue -> streetnumber =, streetname = 390
Here's a system that gets all the above cases right: the USC Deterministic
Address Parser.
https://webgis.usc.edu/Services/AddressNormalization/Interactive/DeterministicNormalization.aspx
This will parse a street address line alone, without a city, state, or ZIP code,
so it's not using a big database. There's a technical paper
http://gislab.usc.edu/i/publications/gislabtr11.pdf
but it doesn't have that much detail. However, now we know a solution
exists. I've asked USC if they'll make the code available.
John Nagle
--
http://mail.python.org/mailman/listinfo/python-list