On Mar 28, 1:52 pm, Jon Clements <jon...@googlemail.com> wrote: > On Wednesday, 28 March 2012 19:39:54 UTC+1, larry....@gmail.com wrote: > > I have the following use case: > > > I have a set of data that is contains 3 fields, K1, K2 and a > > timestamp. There are duplicates in the data set, and they all have to > > processed. > > > Then I have another set of data with 4 fields: K3, K4, K5, and a > > timestamp. There are also duplicates in that data set, and they also > > all have to be processed. > > > I need to find all the items in the second data set where K1==K3 and > > K2==K4 and the 2 timestamps are within 20 seconds of each other. > > > I have this working, but the way I did it seems very inefficient - I > > simply put the data in 2 arrays (as tuples) and then walked through > > the entire second data set once for each item in the first data set, > > looking for matches. > > > Is there a better, more efficient way I could have done this? > > It might not be more *efficient* but others might find it more readable, and > it'd be easier to change later. Try an in-memory SQL DB (such as sqlite3) and > query as (untested) > > select t2.* from t1 join t2 on k1=k3 and k2=k4 where abs(t1.timestamp - > t2.timestamp) < 20
This is part of django app, and the data came from mysql. Through a mechanism I can't change at this time (it would take a lot of code changes and this new functionality is needed ASAP) I get all the data at once and have to winnow it down. > Failing that, two (default)dicts with a tuple as the pair, then use that as > your base. Since there are duplicates, I can't use a dict. And if I have any extraneous data in the keys (i.e. something to make them unique) then I still have to walk through the entire dict to find the matches. -- http://mail.python.org/mailman/listinfo/python-list