On 28 January 2014 20:16, Florian Pflug <f...@phlo.org> wrote: > On Jan27, 2014, at 23:28 , Dean Rasheed <dean.a.rash...@gmail.com> wrote: >> strict transfn vs non-strict inv_transfn >> ---------------------------------------- >> >> This case is explicitly forbidden, both in CREATE AGGREGATE and in the >> executor. To me, that seems overly restrictive --- if transfn is >> strict, then you know for sure that no NULL values are added to the >> aggregate state, so surely it doesn't matter whether or not >> inv_transfn is strict. It will never be asked to remove NULL values. >> >> I think there are definite use-cases where a user might want to use a >> pre-existing function as the inverse transition function, so it seems >> harsh to force them to wrap it in a strict function in this case. >> >> AFAICS, the code in advance/retreat_windowaggregate should just work >> if those checks are removed. > > True. It didn't use to in earlier version of the patch because the advance > and retreat functions looked at the strict settings directly, but now that > there's an ignore_nulls flag in the executor node, the only requirement > should be that ignore_nulls is set if either of the transition functions > is strict. > > I'm not sure that the likelihood of someone wanting to combine a strict > forward with a non-strict inverse function is very hight, but neither >
Me neither, but the checks to forbid it aren't adding anything, and I think it's best to make it as flexible as possible. >> non-strict transfn vs strict inv_transfn >> ---------------------------------------- >> >> At first this seems as though it must be an error --- the forwards >> transition function allows NULL values into the aggregate state, and >> the inverse transition function is strict, so it cannot remove them. >> >> But actually what the patch is doing in this case is treating the >> forwards transition function as partially strict --- it won't be >> passed NULL values (at least in a window context), but it may be >> passed a NULL state in order to build the initial state when the first >> non-NULL value arrives. > > Exactly. > >> It looks like this behaviour is intended to support aggregates that >> ignore NULL values, but cannot actually be made to have a strict >> transition function. I think typically this is because the aggregate's >> initial state is NULL and it's state type differs from the type of the >> values being aggregated, and so can't be automatically created from >> the first non-NULL value. > > Yes. I added this because the alternative would haven been to count > non-NULL values in most of the existing SUM() aggregates. > >> That all seems quite ugly though, because now you have a transition >> function that is not strict, which is passed NULL values when used in >> a normal aggregate context, and not passed NULL values when used in a >> window context (whether sliding or not), except for the NULL state for >> the first non-NULL value. > > Ugh, true. Clearly, nodeAgg.c needs to follow what nodeWindowAgg.c does > and skip NULL inputs if the aggregate has a strict inverse transition > function. That fact that we never actually *call* the inverse doesn't > matter. Will fix that once we decided on the various strictness issues. > Yuk! >> I'm not sure if there is a better way to do it though. If we disallow >> this case, these aggregates would have to use non-strict functions for >> both the forward and inverse transition functions, which means they >> would have to implement their own non-NULL value counting. >> Alternatively, allowing strict transition functions for these >> aggregates would require that we provide some other way to initialise >> the state from the first non-NULL input, such as a new initfn. > > I'm not sure an initfn would really help. It would allow us to return > to the initial requirement that the strict settings match - but you > already deem the check that the forward transition function can only > be strict if the inverse is also overly harsh, so would that really be > an improvement? It's also cost us an additional pg_proc entry per aggregate. > > Another idea would be to have an explicit nulls=ignore|process option > for CREATE AGGREGATE. If nulls=process and either of the transition > functions are strict, we could either error out, or simply do what > normal functions calls do and pretend they return NULL for NULL inputs. > Not sure how the rule that forward transition functions may not return > NULL if there's an inverse transition function would fit in if we do > the latter, though. > > The question is - is it worth it the effort to add that flag? > Yeah, I thought about a flag too. I think that could work quite nicely. Basically where I'm coming from is trying to make this as flexible as possible, without pre-judging the kinds of aggregates that users may want to add. One use-case I had in mind upthread was suppose you wanted to write a custom version of array_agg that only collected non-NULL values. With such a flag, that would be trivial, but with the current patch you'd have to (count-intuitively) wrap the inverse transition function in a strict function. Another use-case I can imagine is suppose you wanted a custom version of sum() that returned NULL if any of the input values were NULL. If you knew that most of your data was non-NULL, you might still wish to benefit from an inverse transition function. Right now the patch won't allow that because of the error in advance_windowaggregate(), but possibly that could be relaxed by forcing a restart in that case. If I've understood correctly that should give similar to current performance if NULLs are present, and enhanced performance as the window moved over non-NULL data. In that second case, it would also be nice if you could simply re-use the existing sum forward and inverse transition functions, with a different null-handling flag. Also, in cases where the forwards transition function cannot be made strict (e.g., it's state type is internal) and there is no inverse transition function, there might be a small performance gain to be had from not calling the transition function with NULL values, rather than have it ignore them. So I think an ignore-nulls flag would have real benefits, as well as being a cleaner design than relying on a strict inverse transition function. What do others think? Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers