larry.mart...@gmail.com wrote: > I have the following use case: > > I have a set of data that is contains 3 fields, K1, K2 and a > timestamp. There are duplicates in the data set, and they all have to > processed. > > Then I have another set of data with 4 fields: K3, K4, K5, and a > timestamp. There are also duplicates in that data set, and they also > all have to be processed. > > I need to find all the items in the second data set where K1==K3 and > K2==K4 and the 2 timestamps are within 20 seconds of each other. > > I have this working, but the way I did it seems very inefficient - I > simply put the data in 2 arrays (as tuples) and then walked through > the entire second data set once for each item in the first data set, > looking for matches. > > Is there a better, more efficient way I could have done this?
Build a lookup table that maps (K3, K4) to the matching records in the second table. Then you can walk through the first table, look up the matching records in the second and filter by the timestamp constraint: from collections import defaultdict, namedtuple # simulate a database One = namedtuple("One", "K1 K2 T") Two = namedtuple("Two", "K3 K4 K5 T") one = [ ["alpha", "beta", 10], ["gamma", "delta", 20], ["gamma", "delta", 25], ["gamma", "delta", 40], ["kappa", "lambda", 40], ] one = (One(*row) for row in one) two = [ ["alpha", "beta", "epsilon", 10], ["gamma", "delta", "zeta", 20], ["gamma", "delta", "eta", 60], ] two = (Two(*row) for row in two) # build the lookup table lookup = defaultdict(list) for rtwo in two: lookup[rtwo.K3, rtwo.K4].append(rtwo) # show the matches for rone in one: for rtwo in lookup.get((rone.K1, rone.K2), ()): if abs(rone.T-rtwo.T) <= 20: print rone, "-->", rtwo (Personally I'd go for the SQL approach proposed by Jon; I find the argument why you can't do it unconvincing) -- http://mail.python.org/mailman/listinfo/python-list