On Apr9, 2014, at 23:17 , Florian Pflug <f...@phlo.org> wrote: > On Apr9, 2014, at 21:35 , Tom Lane <t...@sss.pgh.pa.us> wrote: >> A quick test says that avg(int4) >> is about five percent slower than sum(int4), so that's the kind of hit >> we'd be taking on non-windowed aggregations if we do it like this. > > That's rather surprising though. AFAICS, there's isn't much difference > between the two transfer functions int4_sum and int4_avg_accum at all. > > The differences come down to (+ denoting things which ought to make > int4_avg_accum slower compared to int4_sum, - denoting the opposite) > > 1. +) int4_avg_accum calls AggCheckCallContext > 2. -) int4_sum checks if the state is NULL (it never is for int4_avg_accum) > 3. +) int4_avg_accum uses ARR_HASNULL, ARR_SIZE, ARR_OVERHEAD_NONULLS > to verify that the state is a 2-element array without NULL entries > 4. -) int4_sum checks if the input is NULL
I've done a bit of profiling on this (using Instruments.app on OSX). One thing I missed is that inv4_avg_accum also calls pg_detoast_datum - that calls comes from the PG_GETARG_ARRAYTYPE_P macro. Doing that is a bit silly, since we know that the datum cannot possibly be toasted I think (or if it was, nodeAgg.c should do the de-toasting). The profile also attributes a rather large percent of the total runtime of int4_avg_accum (around 30%) to the call of AggCheckCallContext(). Getting rid of that would help quite a few transition functions, invertible or not. That certainly seems doable - we'd need a way to mark functions as internal support functions, and prevent user-initiated calls of such functions. Transition functions marked that way could then safely scribble over their state arguments without having to consult AggCheckCallContext() first. I've also compared the disassemblies of int4_sum and int4_avg_accum, and apart from these differences, they are pretty similar. Also, the *only* reason that SUM(int2|int4) cannot use int8 as it's transition type is that it needs to return NULL, not 0, if zero rows were aggregates. It might seems that it could just use int8 as state with initial value NULL, but that only works if the transition functions are non-strict (since the special case of state type == input type doesn't apply here). And for non-strict transition functions need to deal with NULL inputs themselves, which means counting the number of non-NULL inputs.. That problem would go away though if we had support for an initalfunction, which would receive the first input value and initialize the state. In the case of SUM(), the initial function would be strict, and thus would be called on the first non-NULL input value, which it'd convert to int8 and return as the new state. However, I still believe the best approach at this point is to just work on making int4_avg_accum faster. I still see no principal reason what it has to be noticeably slower - the only additional work it absolutely *has* to perform is *one* 64-bit increment. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers