On 12/15/2013 03:57 AM, Tom Lane wrote:
Josh Berkus <j...@agliodbs.com> writes:
I think even the FLOAT case deserves some consideration. What's the
worst-case drift?
Complete loss of all significant digits.
The case I was considering earlier of single-row windows could be made
safe (I think) if we apply the negative transition function first, before
incorporating the new row(s). Then for example if you've got float8 1e20
followed by 1, you compute (1e20 - 1e20) + 1 and get the right answer.
It's not so good with two-row windows though:
Table correct sum of negative-transition
this + next value result
1e20 1e20 1e20 + 1 = 1e20
1 1 1e20 - 1e20 + 0 = 0
0
In general, folks who do aggregate operations on
FLOATs aren't expecting an exact answer, or one which is consistent
beyond a certain number of significant digits.
Au contraire. People who know what they're doing expect the results
to be what an IEEE float arithmetic unit would produce for the given
calculation. They know how the roundoff error ought to behave, and they
will not thank us for doing a calculation that's not the one specified.
I will grant you that there are plenty of clueless people out there
who *don't* know this, but they shouldn't be using float arithmetic
anyway.
And Dave is right: how many bug reports would we get about "NUMERIC is
fast, but FLOAT is slow"?
I've said this before, but: we can make it arbitrarily fast if we don't
have to get the right answer. I'd rather get "it's slow" complaints
than "this is the wrong answer" complaints.
There's another technique we could use which doesn't need a negative
transition function, assuming the order you feed the values to the
aggreate function doesn't matter: keep subtotals. For example, if the
window first contains values 1, 2, 3, 4, you calculate 3 + 4 = 7, and
then 1 + 2 + 7 = 10. Next, 1 leaves the window, and 5 enters it. Now you
calculate 2 + 7 + 5 = 14. By keeping the subtotal (3 + 4 = 7) around,
you saved one addition compared to calculating 2 + 3 + 4 + 5 from scratch.
The negative transition function is a lot simpler and faster for
count(*) and integer operations, so we probably should implement that
anyway. But the subtotals technique could be very useful for other data
types.
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers