On Jan29, 2014, at 09:59 , Dean Rasheed <dean.a.rash...@gmail.com> wrote: > On 28 January 2014 20:16, Florian Pflug <f...@phlo.org> wrote: >> On Jan27, 2014, at 23:28 , Dean Rasheed <dean.a.rash...@gmail.com> wrote: >>> This case is explicitly forbidden, both in CREATE AGGREGATE and in the >>> executor. To me, that seems overly restrictive --- if transfn is >>> strict, then you know for sure that no NULL values are added to the >>> aggregate state, so surely it doesn't matter whether or not >>> inv_transfn is strict. It will never be asked to remove NULL values. >>> >>> I think there are definite use-cases where a user might want to use a >>> pre-existing function as the inverse transition function, so it seems >>> harsh to force them to wrap it in a strict function in this case. >> >> I'm not sure that the likelihood of someone wanting to combine a strict >> forward with a non-strict inverse function is very hight, but neither >> > > Me neither, but the checks to forbid it aren't adding anything, and I > think it's best to make it as flexible as possible.
Ok. >> Another idea would be to have an explicit nulls=ignore|process option >> for CREATE AGGREGATE. If nulls=process and either of the transition >> functions are strict, we could either error out, or simply do what >> normal functions calls do and pretend they return NULL for NULL inputs. >> Not sure how the rule that forward transition functions may not return >> NULL if there's an inverse transition function would fit in if we do >> the latter, though. >> >> The question is - is it worth it the effort to add that flag? > > One use-case I had in mind upthread was suppose you wanted to write a > custom version of array_agg that only collected non-NULL values. With > such a flag, that would be trivial, but with the current patch you'd > have to (count-intuitively) wrap the inverse transition function in a > strict function. I'd be more convinced by that if doing so was actually possible for non-superusers. But it isn't, because the aggregate uses "internal" as it's state type and it's thus entirely up to the user to not crash the database by mixing transfer and final functions with incompatible state data. Plus, instead of creating a custom aggregate, one can just use a FILTER clause to get rid of the NULL values. > Another use-case I can imagine is suppose you wanted a custom version > of sum() that returned NULL if any of the input values were NULL. If > you knew that most of your data was non-NULL, you might still wish to > benefit from an inverse transition function. Right now the patch won't > allow that because of the error in advance_windowaggregate(), but > possibly that could be relaxed by forcing a restart in that case. That's not really true - that patch only forbids that if you insist on representing the state "i have seen a NULL input" with a NULL state value. But if you instead just count the number of NULLS in your transition functions, all you need to do is to have your final function return NULL if that count is not zero. > If I've understood correctly that should give similar to current > performance if NULLs are present, and enhanced performance as the > window moved over non-NULL data. Exactly - and this makes defining a NULL-sensitive SUM() this way rather silly - a simple counter has very nearly zero overhead, and avoids all rescans. > In that second case, it would also be nice if you could simply re-use > the existing sum forward and inverse transition functions, with a > different null-handling flag. Even if we had a nulls=process|ignore flag, SUM's transition functions would still need to take that use-case into account explicitly to make this work - at the very least, the forward transition function would need to return NULL if the input is NULL instead of just skipping it as it does now. But that would leads to completely unnecessary rescans, so what we actually ought to do then is to make it track whether there have been NULL inputs and make the finalfunc return NULL in this case. Which would border on ironic, since the whole raison d'etre for this is to *avoid* spreading NULL-counting logic around... Plus, to make this actually useable, we'd have to document this, and tell people how to define such a SUM aggregate. But I don't want to go there - we really mustn't encourage people to mix-and-match built-in aggregates with state type "internal", since whether they work or crash depends on factors outside the control of the user. And finally, you can get that behaviour quite easily by doing CASE WHEN bool_and(input IS NOT NULL) whatever_agg(input) ELSE NULL END > So I think an ignore-nulls flag would have real benefits, as well as > being a cleaner design than relying on a strict inverse transition > function. The more I think about this, the less convinced I am. In fact, I'm currently leaning towards just forbidding non-strict forward transition function with strict inverses, and adding non-NULL counters to the aggregates that then require them. It's really only the SUM() aggregates that are affected by this, I think. For an argument in favour of that, look at how we handle *non*-NULL initial states. For these, even aggregates with nulls=ignore would require some tracking, because otherwise they'd simply return the initial state if the input contained no non-NULL value. Which quite probably isn't what you'd want - you'd usually want to return NULL in this case (as all the built-in aggregates do). Currently, all built-in aggregates with non-NULL initial states compute an expected value of some kind (average, std. deviation, regressions), and thus need to count the number of inputs anyway. But that's just coincidence - if it wasn't for state type "internal", having SUM start from 0 instead of NULL would be entirely reasonable. So if we have a nulls=ignore flag, we ought to have an nullif=only_nulls flag too to cover all the ground. But at that point, just making this the transition function's responsibility seems more sensible. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers