On Sat, Jan 18, 2014 at 10:42 AM, David Rowley <dgrowle...@gmail.com> wrote:
> >> You could do better than that - the numeric problem amounts to tracking >> the maximum >> scale AFAICS, so it'd be sufficient to restart if we remove a value whose >> scale equals >> the current maximum. But if we do SUM(numeric) at all, I think we should >> do so >> without requiring restarts - I still think the overhead of tracking the >> maximum scale >> within the aggregated values isn't that bad. If we zero the dscale >> counters lazily, >> the numbers of entries we have to zero is bound by the maximum dscale we >> encounter. >> Since actually summing N digits is probably more expensive than zeroing >> them, and since >> we pay the zeroing overhead only once per aggregation but save >> potentially many >> summations, I'm pretty sure we come out ahead by quite some margin. >> >> > We'll work that out, I don't think it will take very long to code up your > idea either. > I just thought that my idea was good enough and very cheap too, won't all > numerics that are actually stored in a column have the same scale anyway? > Is it not only been a problem because we've been testing with random > numeric literals the whole time? > > The test turned out to become: > if (state->expectedScale == -1) > state->expectedScale = X.dscale; > else if (state->expectedScale != X.dscale) > state->expectedScale = -2; > > In do_numeric_accum, then when we do the inverse I just check if > expectedScale < 0 then return NULL. > > I'm not set on it, and I'm willing to try the lazy zeroing of the scale > tracker array, but I'm just not quite sure what extra real use cases it > covers that the one above does not. Perhaps my way won't perform inverse > transitions if the query did sum(numericCol / 10) or so. > > I'll be committing this to the github repo very soon, so we can hack away > at the scale tracking code once it's back in. > > Ok, we're back up to 86 aggregate function / type combinations with inverse transition functions. I've commited my latest work up to github and here's a fresh patch which I will need to do more tests on. The test (below) that used to fail a few versions ago is back in there and it's now passing. SELECT SUM(n::numeric) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) FROM (VALUES(1,1.01),(2,2),(3,3)) v(i,n); In this case it won't use inverse transitions because the forward transition function detects that the scale is not fixed. I tested throwing some numerics into a table and I'm pretty happy with the results. create table num (num numeric(10,2) not null); insert into num (num) select * from generate_series(1,20000); select sum(num) over(order by num rows between current row and unbounded following) from num; -- 124ms select sum(num / 10) over(order by num rows between current row and unbounded following) from num; -- 254ms select sum(num / 1) over(order by num rows between current row and unbounded following) from num; -- 108156.917 ms The divide by 1 case is slow because of that weird 20 trailing zero instead of 16 when dividing a numeric by 1 and that causes the inverse transition function to return NULL because the scale changes. I've not tested an unpatched version yet to see how that divide by 1 query performs on that but I'll get to that soon. I'm thinking that the idea about detecting the numeric range with floating point types and performing an inverse transition providing the range has not gone beyond some defined danger zone is not material for this patch... I think it would be not a great deal of work to code, but the testing involved would probably make this patch not possible for 9.4 The latest version of the patch is attached. Regards David Rowley
inverse_transition_functions_v2.7.patch.gz
Description: GNU Zip compressed data
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers