Re: [HACKERS] Combining Aggregates

David Rowley Tue, 29 Mar 2016 20:15:38 -0700

On 26 March 2016 at 15:07, David Rowley <david.row...@2ndquadrant.com> wrote:


Many thanks Robert for committing the serialize states portion.

> 0005:
> Haribabu's patch; no change from last time.

Just in case you jump ahead. I just wanted to re-highlight something
Haribabu mentioned a while ago about combining floating point states
[1]. This discussion did not happen on this thread, so to make sure it
does not get lost...

As of today aggregate calculations for floating point types can vary
depending on the order in which values are aggregated.

For example:

create table f8 (num float8);
insert into f8 select x/100.0 from generate_series(1,10000) x(x);
select stddev(num order by num) from f8;
      stddev
------------------
 28.8689567990717
(1 row)

select stddev(num order by num desc) from f8;
      stddev
------------------
 28.8689567990716
(1 row)

select stddev(num order by random()) from f8;
      stddev
------------------
 28.8689567990715
(1 row)

And of course the execution plan can determine the order in which rows
are aggregated, even if the underlying data does not change.

Parallelising these aggregates increases the chances of seeing these
variations as the number of rows aggregated in each worker is going to
vary on each run, so the numerical anomalies will also vary between
each run.

I wrote in [1]:
> We do also warn about this in the manual: "Inexact means that some
> values cannot be converted exactly to the internal format and are
> stored as approximations, so that storing and retrieving a value might
> show slight discrepancies. Managing these errors and how they
> propagate through calculations is the subject of an entire branch of
> mathematics and computer science and will not be discussed here,
> except for the following points:" [1]
> [1] http://www.postgresql.org/docs/devel/static/datatype-numeric.html

Does this text in the documents stretch as far as the variable results
from parallel aggregate for floating point types? or should we be more
careful and not parallelise these, similar to how we didn't end up
with inverse aggregate transition functions for floating point types?

I'm personally undecided, and would like to hear what others think.

[1] 
http://www.postgresql.org/message-id/cakjs1f_hplfhkd2ylfrsrmumbzwqkgvjcwx21b_xg1a-0pz...@mail.gmail.com
-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Combining Aggregates

Reply via email to