[snip]Thanks for all the suggestions. There were some really useful pointers.
A few random points:
4. You need to be careful doing an endswith search. It was actually my first approach to the house name issue. The problem is you end up matching "12 Acacia Avenue, ..." with "2 Acacia Avenue, ...".
Is that really a problem? That looks like a likely typo to me. I guess it depends on your data set. In my case, the addresses were scattered all over the place, with relatively few in a given city, so the likelyhood of two addresses on the same street in the same town was very low. We /wanted/ to check for this kind of 'duplication'.
Note that endswith will not deal with 'Avenue' vs. 'Ave.', but I supose a normalization phase could take care of this for you. The Monge algorithm I pointed you to takes care of this pretty nicely.
-- -------------------------------------------------------------------- Aaron Bingham Application Developer Cenix BioScience GmbH --------------------------------------------------------------------
-- http://mail.python.org/mailman/listinfo/python-list