On Jun 22, 4:43 am, Eric <[EMAIL PROTECTED]> wrote: > On Jun 21, 9:47 am, cjl <[EMAIL PROTECTED]> wrote: > > > > > P: > > > I am working on a project that requires geocoding, and have written a > > very simple geocoder that uses the Google service. > > > I would like to be able to extract the name of the street from the > > addresses in my data, however they vary significantly. Here a some > > examples: > > > 25 Main St > > 2500 14th St > > 12 Bennet Pkwy > > Pearl St > > Bennet Rd and Main st > > 19th St > > > As you can see, sometimes I have the house number, and sometimes I do > > not. Sometimes the street name is a number. Sometimes I simply have > > the names of intersecting streets. > > > I would like to be able to parse the above into the following: > > > Main St > > 14th St > > Bennet Pkwy > > Pearl St > > Bennet Rd > > Main St > > 19th St > > > How might I approach this complex parsing problem? > > > -CJL > > You might be able to use consistencies in your data to make this > simpler. If the examples you have there are representative, it looks > like what you should do is look for a word like 'St' or 'Rd' and then > return that word and the previous word.
The OP's data already contains [corner|cnr [of]] Foo Rd and|& Bar St and real world data will contain things like 1234 John F Kennedy Memorial Drive 456 Broadway As Paul wrote, "Parsing street addresses is a very complex parsing problem", even when you restrict yourself to one mostly-English- speaking country. Software written under such restrictions rapidly breaks down elsewhere (Rue de la Paix, Wilhelmstrasse, Avenida 9 de Julio, etc) and blows up altogether when street names aren't used in postal addresses (e.g. Japan). -- http://mail.python.org/mailman/listinfo/python-list