On Wednesday, 28 March 2012 19:39:54 UTC+1, larry....@gmail.com wrote: > I have the following use case: > > I have a set of data that is contains 3 fields, K1, K2 and a > timestamp. There are duplicates in the data set, and they all have to > processed. > > Then I have another set of data with 4 fields: K3, K4, K5, and a > timestamp. There are also duplicates in that data set, and they also > all have to be processed. > > I need to find all the items in the second data set where K1==K3 and > K2==K4 and the 2 timestamps are within 20 seconds of each other. > > I have this working, but the way I did it seems very inefficient - I > simply put the data in 2 arrays (as tuples) and then walked through > the entire second data set once for each item in the first data set, > looking for matches. > > Is there a better, more efficient way I could have done this?
It might not be more *efficient* but others might find it more readable, and it'd be easier to change later. Try an in-memory SQL DB (such as sqlite3) and query as (untested) select t2.* from t1 join t2 on k1=k3 and k2=k4 where abs(t1.timestamp - t2.timestamp) < 20 Failing that, two (default)dicts with a tuple as the pair, then use that as your base. Jon. -- http://mail.python.org/mailman/listinfo/python-list