On 11 October 2010 15:03, Tom Lane <t...@sss.pgh.pa.us> wrote: > Dean Rasheed <dean.a.rash...@gmail.com> writes: >> On 10 October 2010 22:16, Tom Lane <t...@sss.pgh.pa.us> wrote: >>> BTW, as far as the implementation issues go, telling tuplesort that it >>> can use gigabytes of memory no matter what seems quite unacceptable. >>> Put this thing into a hash aggregation and you'll blow out your memory >>> in no time. I don't think it's even a good idea to use work_mem there. > >> Argh! Yes that sounds like a much more serious problem. > >> Interestingly I couldn't seem to produce this effect. Every effort I >> make to write a query to test this with median ends up being executed >> using a GroupAggregate, while the equivalent query with avg uses a >> HashAggregate. I don't understand why they are being treated >> differently. > > If you're using recent sources, there's some code in count_agg_clauses() > that assumes that an aggregate with transtype INTERNAL will use > ALLOCSET_DEFAULT_INITSIZE (ie 8K) workspace. So that'll discourage the > planner from selecting HashAggregate except for a pretty small number of > groups. The problem is that there's still a whole lot of daylight > between 8K and 2G, so plenty of room to go wrong. > > The other approach that we could take here is to replace the > ALLOCSET_DEFAULT_INITSIZE hack (which is certainly no more than a hack) > with some way for an aggregate to declare how much space it'll eat, > or more simply to mark it as "never use in HashAgg". This was discussed > earlier but it would require a significant amount of dogwork and no one > was real excited about doing it. Maybe it's time to bite that bullet > though. Reflecting on it, I think it'd be best to allow an agg to > provide an estimation function that'd be told the input data type and > expected number of rows --- even on a per-aggregate basis, a constant > estimate just isn't good enough.
How good will that estimate of the number of rows be though? If they're coming from a SRF it could be a huge under-estimate, and you'd still risk eating all the memory, if you allowed a hash aggregate. Regards, Dean > > regards, tom lane > -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers