On 27 March 2018 at 11:28, Tom Lane <t...@sss.pgh.pa.us> wrote: > David Rowley <david.row...@2ndquadrant.com> writes: >> This very much reminds me of something that exists in the 8.4 release notes: >>> SELECT DISTINCT and UNION/INTERSECT/EXCEPT no longer always produce sorted >>> output (Tom) > > That's a completely false analogy: we got a significant performance > benefit for a significant fraction of users by supporting hashed > aggregation. My argument here is that only a very tiny fraction of > string_agg/array_agg users will not care about aggregation order, and thus > I don't believe that this patch can help very many people. Against that, > it's likely to hurt other people, by breaking their queries and forcing > them to insert expensive explicit sorts to fix it. Even discounting the > backwards-compatibility question, we don't normally adopt performance > features for which it's unclear that the net gain over all users is > positive.
I don't believe you can go and claim this is a false analogy based on your estimates on the number of cases in which each applies. The analogy which I was pointing at was that we've been here before... we've had versions of the planner which generated plans which would have a deterministic sort order and we've later discovered that we can improve performance by allowing the planner to have more flexibility to do things in different ways which may no longer provide implicitly sorted results. We've previously recognised that some users may have become accustomed to the previous behaviour and we've mentioned a workaround in the release notes so that those users are not out in the cold. This seems exactly the same to me, and certainly not "false". I have to say, it really would be a shame to have this concern block us from future optimisations in aggregation. I also think that anyone who expects string_agg to always aggregate in the same order has not done a good job of looking at the docs. I see you already quoted the "This ordering is unspecified by default, but can be controlled by writing an ORDER BY clause within the aggregate call, as shown in Section 4.2.7." part. This text appears in all version of PostgreSQL that we currently support. We do sometimes change things where the docs say something like "this works, but don't rely on it, we might change in the future", in this case, we've never even claimed that it worked in the first place! Anyway, the options are not zero for anyone who is strongly affected with no other workaround. They just need to disable parallel query, which to me seems fairly similar to the 8.4 release note's "the previous behavior can be restored by disabling enable_hashagg" If we go by your argument then we should perhaps remove parallel aggregation for all the floating point types too, since the order in which such values are aggregated also can affect the result. I mentioned this in [1], but nobody seemed too concerned at the time. I see some other discussion on this whole topic in [2]. Looks like parallel array_agg would please the PostGIS people. [1] https://www.postgresql.org/message-id/CAKJS1f8QRDLvewk336SzUzxiXH1wBHG8rdKsqWEHbAraMXA2_Q%40mail.gmail.com [2] https://www.postgresql.org/message-id/flat/CAFjFpRe9W5xvYai-QOs6RshrJf7gWFsiZEZtxnu8vD4qLQZ3LQ%40mail.gmail.com#cafjfpre9w5xvyai-qos6rshrjf7gwfsizeztxnu8vd4qlqz...@mail.gmail.com -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services