On Mon, Jan 13, 2014 at 2:35 PM, Larry Martell <larry.mart...@gmail.com> wrote: > Thanks for the reply. I'm going to take a stab at removing the group > by and doing it all in python. It doesn't look too hard, but I don't > know how it will perform.
Well, if you can't switch to PostgreSQL or such, then doing it in Python is your only option. There are such things as GiST and GIN indexes that might be able to do some of this magic, but I don't think MySQL has anything even remotely like what you're looking for. So ultimately, you're going to have to do your filtering on the database, and then all the aggregation in Python. And it's going to be somewhat complicated code, too. Best I can think of is this, as partial pseudo-code: last_x = -999 x_map = []; y_map = {} merge_me = [] for x,y,e in (SELECT x,y,e FROM t WHERE whatever ORDER BY x): if x<last_x+1: x_map[-1].append((y,e)) else: x_map.append([(y,e)]) last_x=x if y in y_map: merge_me.append((y_map[y], x_map[-1])) y_map[y]=x_map[-1] # At this point, you have x_map which is a list of lists, each one # being one group, and y_map which maps a y value to its x_map list. last_y = -999 for y in sorted(y_map.keys()): if y<last_y+1: merge_me.append((y_map[y], last_x_map)) last_y=y last_x_map=y_map[y] for merge1,merge2 in merge_me: merge1.extend(merge2) merge2[:]=[] # Empty out the list for lst in x_map: if not lst: continue # been emptied out, ignore it do aggregate stats, get sum(lst) and whatever else I think this should be linear complexity overall, but there may be a few aspects of it that are quadratic. It's a tad messy though, and completely untested. But that's an algorithmic start. The idea is that lists get collected based on x proximity, and then lists get merged based on y proximity. That is, if you have (1.0, 10.1), (1.5, 2.3), (3.0, 11.0), (3.2, 15.2), they'll all be treated as a single aggregation unit. If that's not what you want, I'm not sure how to handle it. ChrisA -- https://mail.python.org/mailman/listinfo/python-list