On Jun 21, 8:47 am, cjl <[EMAIL PROTECTED]> wrote: > P: > > I am working on a project that requires geocoding, and have written a > very simple geocoder that uses the Google service. > > I would like to be able to extract the name of the street from the > addresses in my data, however they vary significantly. Here a some > examples: > > 25 Main St > 2500 14th St > 12 Bennet Pkwy > Pearl St > Bennet Rd and Main st > 19th St > > As you can see, sometimes I have the house number, and sometimes I do > not. Sometimes the street name is a number. Sometimes I simply have > the names of intersecting streets. > > I would like to be able to parse the above into the following: > > Main St > 14th St > Bennet Pkwy > Pearl St > Bennet Rd > Main St > 19th St > > How might I approach this complex parsing problem? > > -CJL
Parsing street addresses is a very complex parsing problem. Please look at this example (http://pyparsing.wikispaces.com/space/showimage/ streetAddressParser.py) from the pyparsing wiki, which includes support for these test cases: 100 South Street 123 Main 221B Baker Street 10 Downing St 1600 Pennsylvania Ave 33 1/2 W 42nd St. 454 N 38 1/2 21A Deer Run Drive 256K Memory Lane 12-1/2 Lincoln 23N W Loop South 23 N W Loop South I took your list and added them to the test cases, which broke a few lines in the grammar. The current online version now includes support for your new formats as well. Here is some sample output from the pyparsing example: 100 South Street ['100', 'South', 'Street'] - name: South - number: 100 - street: ['100', 'South', 'Street'] - name: South - number: 100 - type: Street - type: Street Street is South 221B Baker Street ['221B', 'Baker', 'Street'] - name: Baker - number: 221B - street: ['221B', 'Baker', 'Street'] - name: Baker - number: 221B - type: Street - type: Street Street is Baker Street 10 Downing St ['10', 'Downing', 'St'] - name: Downing - number: 10 - street: ['10', 'Downing', 'St'] - name: Downing - number: 10 - type: St - type: St Street is Downing St 1600 Pennsylvania Ave ['1600', 'Pennsylvania', 'Ave'] - name: Pennsylvania - number: 1600 - street: ['1600', 'Pennsylvania', 'Ave'] - name: Pennsylvania - number: 1600 - type: Ave - type: Ave Street is Pennsylvania Ave 33 1/2 W 42nd St. ['33 1/2', 'W 42 nd', 'St'] - name: W 42 nd - number: 33 1/2 - street: ['33 1/2', 'W 42 nd', 'St'] - name: W 42 nd - number: 33 1/2 - type: St - type: St Street is W 42 nd St 454 N 38 1/2 ['454', 'N 38 1/2'] - name: N 38 1/2 - number: 454 - street: ['454', 'N 38 1/2'] - name: N 38 1/2 - number: 454 Street is N 38 1/2 25 Main St ['25', 'Main', 'St'] - name: Main - number: 25 - street: ['25', 'Main', 'St'] - name: Main - number: 25 - type: St - type: St Street is Main St 2500 14th St ['2500', '14 th', 'St'] - name: 14 th - number: 2500 - street: ['2500', '14 th', 'St'] - name: 14 th - number: 2500 - type: St - type: St Street is 14 th St 12 Bennet Pkwy ['12', 'Bennet', 'Pkwy'] - name: Bennet - number: 12 - street: ['12', 'Bennet', 'Pkwy'] - name: Bennet - number: 12 - type: Pkwy - type: Pkwy Street is Bennet Pkwy Pearl St ['Pearl', 'St'] - name: Pearl - street: ['Pearl', 'St'] - name: Pearl - type: St - type: St Street is Pearl St Bennet Rd and Main St ['Bennet', 'Rd', 'and', 'Main', 'St'] - crossStreet: ['Bennet', 'Rd'] - name: Bennet - type: Rd - name: Main - street: ['Main', 'St'] - name: Main - type: St - type: St Street is Main St 19th St ['19 th', 'St'] - name: 19 th - street: ['19 th', 'St'] - name: 19 th - type: St - type: St Street is 19 th St -- Paul -- http://mail.python.org/mailman/listinfo/python-list