On 1 April 2014 20:58, Florian Pflug <f...@phlo.org> wrote: > On Apr1, 2014, at 10:08 , Dean Rasheed <dean.a.rash...@gmail.com> wrote: >> On 31 March 2014 01:58, Florian Pflug <f...@phlo.org> wrote: >>> Attached are updated patches that include the EXPLAIN changes mentioned >>> above and updated docs. >>> >> >> These patches need re-basing --- they no longer apply to HEAD. > > Rebased to head (554bb3beba27bf4a49edecc40f6c0f249974bc7c). I had to > re-assign OIDs in the dependent paches again (those which actually > add the various inverse transition functions). >
Looking at the new explain output, that is not exactly what I was suggesting upthread. In particular, in cases where the WindowAgg node is executed more than once, I think that the reported transition counts should be the averages per-execution of the node. That way the number of transition function calls reported for the node is directly comparable with its "rows" value. Thus I think the output should be divided by nloops, which would be more consistent with the way other similar values are reported in explain output (c.f. show_instrumentation_count). I started looking at the "arith" patch, and I got as far as looking at the changes to count(*) and count(val), which seem reasonable. But then I started testing performance, and I found cases where the improvement is not nearly what I expected. The example cited at the start of this thread is indeed orders of magnitude faster than HEAD: SELECT SUM(n::int) OVER (ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) FROM generate_series(1,20000) g(n); This executes in 20ms on my box, vs 30sec on HEAD, which reflects the change from an O(n^2) to an O(n) algorithm. However, if both ends of the frame move, the benefits are far less impressive: SELECT SUM(n::int) OVER (ROWS BETWEEN CURRENT ROW AND 10 FOLLOWING) FROM generate_series(1,20000) g(n); -- 33ms SELECT SUM(n::int) OVER (ROWS BETWEEN CURRENT ROW AND 100 FOLLOWING) FROM generate_series(1,20000) g(n); -- 173ms SELECT SUM(n::int) OVER (ROWS BETWEEN CURRENT ROW AND 1000 FOLLOWING) FROM generate_series(1,20000) g(n); -- 1467ms SELECT SUM(n::int) OVER (ROWS BETWEEN CURRENT ROW AND 10000 FOLLOWING) FROM generate_series(1,20000) g(n); -- 7709ms This is still exhibiting the behaviour of an O(n^2) algorithm. The problem lies in window_gettupleslot() which steps one row at a time from the last position to the new required position. So if both ends of the frame are moving, it needs to step forwards and backwards through the entire window, for each row processed - hence the O(n^2) behaviour. Looking at window_gettupleslot, there is an obvious potential mirco-optimisation that can be made if there are multiple rows to be skipped --- instead of calling tuplestore_gettupleslot() multiple times, call tuplestore_advance() multiple times, then call tuplestore_gettupleslot() once to fetch the final required tuple, thus avoiding a lot of unnecessary tuple copying. That of course won't eliminate the O(n^2) behaviour, but it will reduce the overall factor, and so is probably worth doing. However, to fix the fundamental O(n^2) problem, I think there needs to be separate tuplestore read pointers for the head and tail of the frame. There may also be a case for another read pointer for the current row too, and possibly one for general purpose user-triggered fetches. One approach might be to have up to a small number N (say 3 or 4) of read pointers, with window_gettupleslot() automatically choosing the one nearest to the requested row. Possibly there are better approaches. I think a little more investigation is needed. I'm not sure how much additional work is required to sort this out, but to me it looks more realistic to target 9.5 than 9.4, so at this point I tend to think that the patch ought to be marked as returned with feedback. Thoughts? Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers