On Jun 21, 6:03 pm, John Machin <[EMAIL PROTECTED]> wrote:
> On Jun 22, 4:43 am, Eric <[EMAIL PROTECTED]> wrote:
>
>
>
> > On Jun 21, 9:47 am, cjl <[EMAIL PROTECTED]> wrote:
>
> > > P:
>
> > > I am working on a project that requires geocoding, and have written a
> > > very simple geocoder that uses the Google service.
>
> > > I would like to be able to extract the name of the street from the
> > > addresses in my data, however they vary significantly. Here a some
> > > examples:
>
> > > 25 Main St
> > > 2500 14th St
> > > 12 Bennet Pkwy
> > > Pearl St
> > > Bennet Rd and Main st
> > > 19th St
>
> > > As you can see, sometimes I have the house number, and sometimes I do
> > > not. Sometimes the street name is a number. Sometimes I simply have
> > > the names of intersecting streets.
>
> > > I would like to be able to parse the above into the following:
>
> > > Main St
> > > 14th St
> > > Bennet Pkwy
> > > Pearl St
> > > Bennet Rd
> > > Main St
> > > 19th St
>
> > > How might I approach this complex parsing problem?
>
> > > -CJL
>
> > You might be able to use consistencies in your data to make this
> > simpler.  If the examples you have there are representative, it looks
> > like what you should do is look for a word like 'St' or 'Rd' and then
> > return that word and the previous word.
>
> The OP's data already contains
>     [corner|cnr [of]] Foo Rd and|& Bar St
> and real world data will contain things like
>     1234 John F Kennedy Memorial Drive
>     456 Broadway
>
> As Paul wrote, "Parsing street addresses is a very complex parsing
> problem", even when you restrict yourself to one mostly-English-
> speaking country. Software written under such restrictions rapidly
> breaks down elsewhere (Rue de la Paix, Wilhelmstrasse, Avenida 9 de
> Julio, etc) and blows up altogether when street names aren't used in
> postal addresses (e.g. Japan).

No doubt that address parsing is, in general, a very difficult
problem.  However, it may not be necessary for him to solve the
general problem.  If his dataset is more limited in formats then his
problem is much simpler.

-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to