On Mon, Jan 13, 2014 at 1:09 AM, Chris Angelico <ros...@gmail.com> wrote: > On Mon, Jan 13, 2014 at 2:35 PM, Larry Martell <larry.mart...@gmail.com> > wrote: >> Thanks for the reply. I'm going to take a stab at removing the group >> by and doing it all in python. It doesn't look too hard, but I don't >> know how it will perform. > > Well, if you can't switch to PostgreSQL or such, then doing it in > Python is your only option. There are such things as GiST and GIN > indexes that might be able to do some of this magic, but I don't think > MySQL has anything even remotely like what you're looking for. > > So ultimately, you're going to have to do your filtering on the > database, and then all the aggregation in Python. And it's going to be > somewhat complicated code, too. Best I can think of is this, as > partial pseudo-code: > > last_x = -999 > x_map = []; y_map = {} > merge_me = [] > for x,y,e in (SELECT x,y,e FROM t WHERE whatever ORDER BY x): > if x<last_x+1: > x_map[-1].append((y,e)) > else: > x_map.append([(y,e)]) > last_x=x > if y in y_map: > merge_me.append((y_map[y], x_map[-1])) > y_map[y]=x_map[-1] > > # At this point, you have x_map which is a list of lists, each one > # being one group, and y_map which maps a y value to its x_map list. > > last_y = -999 > for y in sorted(y_map.keys()): > if y<last_y+1: > merge_me.append((y_map[y], last_x_map)) > last_y=y > last_x_map=y_map[y] > > for merge1,merge2 in merge_me: > merge1.extend(merge2) > merge2[:]=[] # Empty out the list > > for lst in x_map: > if not lst: continue # been emptied out, ignore it > do aggregate stats, get sum(lst) and whatever else > > I think this should be linear complexity overall, but there may be a > few aspects of it that are quadratic. It's a tad messy though, and > completely untested. But that's an algorithmic start. The idea is that > lists get collected based on x proximity, and then lists get merged > based on y proximity. That is, if you have (1.0, 10.1), (1.5, 2.3), > (3.0, 11.0), (3.2, 15.2), they'll all be treated as a single > aggregation unit. If that's not what you want, I'm not sure how to > handle it.
Thanks. Unfortunately this has been made a low priority task and I've been put on to something else (I hate when they do that). I'll revive this thread when I'm allowed to get back on this. -- https://mail.python.org/mailman/listinfo/python-list