On Jun 21, 6:03 pm, John Machin <[EMAIL PROTECTED]> wrote: > On Jun 22, 4:43 am, Eric <[EMAIL PROTECTED]> wrote: > > > > > On Jun 21, 9:47 am, cjl <[EMAIL PROTECTED]> wrote: > > > > P: > > > > I am working on a project that requires geocoding, and have written a > > > very simple geocoder that uses the Google service. > > > > I would like to be able to extract the name of the street from the > > > addresses in my data, however they vary significantly. Here a some > > > examples: > > > > 25 Main St > > > 2500 14th St > > > 12 Bennet Pkwy > > > Pearl St > > > Bennet Rd and Main st > > > 19th St > > > > As you can see, sometimes I have the house number, and sometimes I do > > > not. Sometimes the street name is a number. Sometimes I simply have > > > the names of intersecting streets. > > > > I would like to be able to parse the above into the following: > > > > Main St > > > 14th St > > > Bennet Pkwy > > > Pearl St > > > Bennet Rd > > > Main St > > > 19th St > > > > How might I approach this complex parsing problem? > > > > -CJL > > > You might be able to use consistencies in your data to make this > > simpler. If the examples you have there are representative, it looks > > like what you should do is look for a word like 'St' or 'Rd' and then > > return that word and the previous word. > > The OP's data already contains > [corner|cnr [of]] Foo Rd and|& Bar St > and real world data will contain things like > 1234 John F Kennedy Memorial Drive > 456 Broadway > > As Paul wrote, "Parsing street addresses is a very complex parsing > problem", even when you restrict yourself to one mostly-English- > speaking country. Software written under such restrictions rapidly > breaks down elsewhere (Rue de la Paix, Wilhelmstrasse, Avenida 9 de > Julio, etc) and blows up altogether when street names aren't used in > postal addresses (e.g. Japan).
No doubt that address parsing is, in general, a very difficult problem. However, it may not be necessary for him to solve the general problem. If his dataset is more limited in formats then his problem is much simpler. -- http://mail.python.org/mailman/listinfo/python-list