On Apr11, 2014, at 01:30 , Tom Lane <t...@sss.pgh.pa.us> wrote: > Florian Pflug <f...@phlo.org> writes: >> As for evidence - have you looked at the patch I posted? I'd be very >> interested to know if it removes the performance differences you saw. > > (1) You can't really prove the absence of a performance issue by showing > that one specific aggregate doesn't have an issue.
I'm claiming that SUM(int4) is about as simple as it gets, so if the effect can be mitigated there, it can be mitigated everywhere. The more complex a forward-only transition function is, the less will and added if or two hurt. > (2) These results > (mine as well as yours) seem mighty platform-dependent, so the fact you > don't see an issue doesn't mean someone else won't. Yeah, though *any* change - mine, the one your propose, and any other - has the potential to hurt some platform due to weird interactions (say, cache trashing). > (3) A new > FunctionCallInfoData field just for this? Surely not. There's got to be > a distributed cost to that (although I notice you didn't bother > initializing the field most places, which is cheating). I think the field doesn't actually increase the size of the structure at all - at least not if the bool before it has size 1 and the short following it is 2-byte aligned. Or at least that was why I made it a char, and added it at the place I did. But I might be missing something... Also, InitFunctionCallInfoData *does* initialize the flags to zero. Though maybe not everybody uses that - I didn't check, this was just a quick hack. > Now point 3 could be addressed by doing the signaling in some other way > with the existing context field. But it's still the case that you're > trying to band-aid a bad design. There's no good reason to make the sfunc > do extra work to be invertible in contexts where we know, with certainty, > that that work is useless. This is the principal point where we disagree, I think. From my POV, the problem isn't invertibility here at all. Heck, SUM(int4) wouldn't need *any* extra state at all to be invertible, if it weren't for those pesky issues surrounding NULL handling. In fact, an earlier version of the invtrans patch *did* use int4_sum as SUM(int4)'s transfer functions! The only reason this doesn't work nowadays is that Dean didn't like the forward-nonstrict-but-inverse-strict special case that made this work. The way I see it, the main problem is the drop in performance that comes from using a pass-by-ref state type. Which IMHO happens because this *already* is a heavily band-aided design - all the state validation and "if (AggCheckCallContext(,NULL))" stuff really works around the fact that transition functions *aren't* supposed to be user-called, yet they are, and must deal. Which is why I feel that having two separate sets of transition functions and state types solves the wrong problem. If we find a way to prevent transition functions from being called directly, we'll shave a few cycles of quite a few existing aggregates, invertible or not. If we find a way around the initfunc-for-internal-statetype problem you discovered, the implementation would get much simpler, since we could then make nearly all of them strict. And this would again shave off a few cycles - for lots of NULL inputs, the effect could even be large. Compared to that, the benefit of having a completely separate set of transition functions and state types for the invertible case is much less tangible, IMHO. > Especially not when we know that even a few instructions of extra work > can be performance-significant. But where do we draw that line? nodeWindowAgg.c quite certainly wastes about as many cycles as int4_avg_accum does on various checks that are unnecessary unless in the non-sliding window case. Do we duplicate those functions too? best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers