On 3 March 2014 23:00, Florian Pflug <f...@phlo.org> wrote: >> * In show_windowagg_info(), this calculation looks suspicious to me: >> >> double tperrow = winaggstate->aggfwdtrans / >> (inst->nloops * inst->ntuples); >> >> If the node is executed multiple times, aggfwdtrans will be reset in >> each loop, so the transitions per row figure will be under-estimated. >> ISTM that if you want to report on this, you'd need aggfwdtrans to be >> reset once per query, but I'm not sure exactly how to do that. >> >> ... >> >> Actually, I think it's misleading to only count forward transition >> function calls, because a call to the inverse transition function >> still represents a state transition, and is likely to be around the >> same cost. For a window of size 2, there would not be much advantage >> to using inverse transition functions, because it would be around 2 >> transitions per row either way. > > True. In fact, I pondered whether to avoid using the inverse transition > function for windows of 2 rows. In the end, I didn't because I felt that > it makes custom aggregates harder to test. > > On the question of whether to count inverse transition function calls - > the idea of the EXPLAIN VERBOSE ANALYZE output isn't really to show the > number of state transitions, but rather to show whether the aggregation > has O(n) or O(n^2) behaviour. The idea being that a value close to "1" > means "inverse transition function works as expected", and larger values > mean "not working so well". > > Regarding multiple evaluations - I think I based the behaviour on how > ntuples works, which also only reports the value of the last evaluation > I think. But maybe I'm confused about this. >
No, it doesn't look like that's correct for multiple loops. Consider this example: explain (verbose, analyse) select * from (values (10), (20), (30), (40)) v(x), lateral (select sum(i) over (rows between 4 preceding and current row) from generate_series(1, x) i) t; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=0.00..170.06 rows=4000 width=12) (actual time=0.027..0.414 rows=100 loops=1) Output: "*VALUES*".column1, (sum(i.i) OVER (?)) -> Values Scan on "*VALUES*" (cost=0.00..0.05 rows=4 width=4) (actual time=0.002..0.006 rows=4 loops=1) Output: "*VALUES*".column1 -> WindowAgg (cost=0.00..22.50 rows=1000 width=4) (actual time=0.019..0.094 rows=25 loops=4) Output: sum(i.i) OVER (?) Transitions Per Row: 0.2 -> Function Scan on pg_catalog.generate_series i (cost=0.00..10.00 rows=1000 width=4) (actual time=0.010..0.015 rows=25 loops=4) Output: i.i Function Call: generate_series(1, "*VALUES*".column1) It turns out that show_windowagg_info() is only called once at the end, with ntuples=100, nloops=4 and aggfwdtrans=100, so it's computing tperrow=100/(4*100)=0.25, which then gets truncated to 0.2. So to get 1, you'd have to use this formula: double tperrow = winaggstate->aggfwdtrans / inst->ntuples; I'm still not convinced that's the most useful thing to report though. Personally, I'd prefer to just see the separate counts, e.g.: -> WindowAgg (cost=0.00..22.50 rows=1000 width=4) (actual time=0.019..0.094 rows=25 loops=4) Output: sum(i.i) OVER (?) Forward transitions: 25 Inverse transitions: 25 IMO that gives a clearer picture of what's going on. Thoughts? Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers