John Machin wrote: > A quick silly question: what is the problem that you are trying to > solve?
A fair question :-) The problem may seem a bit strange, but here it is: I have the ability to query a database in a legacy system and extract records which match a particular pattern. Specifically, I can perform queries for records that contain a given search term as a sub-string of a particular column. The specific column contains an address. This database can only be accessed through this particular interface (don't ask why, it's one of the reasons it's a *legacy* system). I also have access to a list that contains the vast majority (possibly all) the addresses which are stored in the database. Now I want to issue a series of queries, such that when I combine all the data returned I have accessed all the records in the database. However, I want to minimise the total number of queries and also want to keep the number of records returned by more than one query small. Now the current approach I use is to divide the addresses I have into tokens and take the last token in the address (excluding the postal code). The union of these "last tokens" forms my set of queries. The last token in the address is typically a county or a town in a UK address. This works, but I was wondering if I could do something more efficient. The problem is that while the search term "London" matches all the addresses in London it also returns all the addresses containing "London Road", and a lot of towns have a London Road. Perhaps I would be better off searching for "Road", "Street", "Avenue" .... It occurred to me that this my be isomorphic to a known problem, however given that I want to keep two things small, the problem isn't very well defined. The current approach works, I was just musing whether there was a faster approach, so don't think about it too hard. - Andrew -- http://mail.python.org/mailman/listinfo/python-list