Re: Parallel Aggregates for string_agg and array_agg

David Rowley Mon, 26 Mar 2018 17:15:07 -0700

On 27 March 2018 at 11:28, Tom Lane <t...@sss.pgh.pa.us> wrote:
> David Rowley <david.row...@2ndquadrant.com> writes:
>> This very much reminds me of something that exists in the 8.4 release notes:
>>> SELECT DISTINCT and UNION/INTERSECT/EXCEPT no longer always produce sorted 
>>> output (Tom)
>
> That's a completely false analogy: we got a significant performance
> benefit for a significant fraction of users by supporting hashed
> aggregation.  My argument here is that only a very tiny fraction of
> string_agg/array_agg users will not care about aggregation order, and thus
> I don't believe that this patch can help very many people.  Against that,
> it's likely to hurt other people, by breaking their queries and forcing
> them to insert expensive explicit sorts to fix it.  Even discounting the
> backwards-compatibility question, we don't normally adopt performance
> features for which it's unclear that the net gain over all users is
> positive.

I don't believe you can go and claim this is a false analogy based on
your estimates on the number of cases in which each applies. The
analogy which I was pointing at was that we've been here before...
we've had versions of the planner which generated plans which would
have a deterministic sort order and we've later discovered that we can
improve performance by allowing the planner to have more flexibility
to do things in different ways which may no longer provide implicitly
sorted results. We've previously recognised that some users may have
become accustomed to the previous behaviour and we've mentioned a
workaround in the release notes so that those users are not out in the
cold. This seems exactly the same to me, and certainly not "false".

I have to say, it really would be a shame to have this concern block
us from future optimisations in aggregation.

I also think that anyone who expects string_agg to always aggregate in
the same order has not done a good job of looking at the docs. I see
you already quoted the "This ordering is unspecified by default, but
can be controlled by writing an ORDER BY clause within the aggregate
call, as shown in Section 4.2.7." part. This text appears in all
version of PostgreSQL that we currently support. We do sometimes
change things where the docs say something like "this works, but don't
rely on it, we might change in the future", in this case, we've never
even claimed that it worked in the first place!

Anyway, the options are not zero for anyone who is strongly affected
with no other workaround. They just need to disable parallel query,
which to me seems fairly similar to the 8.4 release note's "the
previous behavior can be restored by disabling enable_hashagg"

If we go by your argument then we should perhaps remove parallel
aggregation for all the floating point types too, since the order in
which such values are aggregated also can affect the result. I
mentioned this in [1], but nobody seemed too concerned at the time.

I see some other discussion on this whole topic in [2]. Looks like
parallel array_agg would please the PostGIS people.

[1]
https://www.postgresql.org/message-id/CAKJS1f8QRDLvewk336SzUzxiXH1wBHG8rdKsqWEHbAraMXA2_Q%40mail.gmail.com
[2]
https://www.postgresql.org/message-id/flat/CAFjFpRe9W5xvYai-QOs6RshrJf7gWFsiZEZtxnu8vD4qLQZ3LQ%40mail.gmail.com#cafjfpre9w5xvyai-qos6rshrjf7gwfsizeztxnu8vd4qlqz...@mail.gmail.com

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: Parallel Aggregates for string_agg and array_agg

Reply via email to