On Wed, Oct 25, 2023 at 05:10:17PM -0700, David G. Johnston wrote: > The paragraph leading into the last added example needs to be tweaked: > > If DISTINCT is specified within an aggregate, the data is sorted in ascending > order while extracting unique values. You can add an ORDER BY clause, limited > to expressions matching the regular arguments of the aggregate, to sort the > output in descending order. > > (show existing - DISTINCT only - example here) > > <programlisting> > WITH vals (v) AS ( VALUES (1),(3),(4),(3),(2) ) > SELECT string_agg(DISTINCT v::text, ';' ORDER BY v::text DESC) FROM vals; > string_agg > ----------- > 4;3;2;1 > </programlisting> > > (existing note)
I see what you mean. I added an example that doesn't match the existing paragraph. I have rewritten the paragraph and used a relevant example; patch attached. > Question: Do you know whether we for certain always sort ascending here to > compute the unique values or whether if, say, there is an index on the column > in descending order (or ascending and traversed backwards) that the data > within > the aggregate could, with an order by, be returned in descending order? If it > is ascending, is that part of the SQL Standard (since it doesn't even allow an > order by to give the user the ability the control the output ordering) or does > the SQL Standard expect that even a random order would be fine since there are > algorithms that can be used that do not involve sorting the input? I don't think order is ever guaranteed in the standard without an ORDER BY. > It seems redundant to first say "regular arguments" then negate it in order to > say "DISTINCT list". Using the positive form with "DISTINCT list" should get > the point across sufficiently and succinctly. It also avoids me feeling like > there should be an example of what happens when you do "sort on an expression > that is not included in the DISTINCT list". Agreed, I rewrote that. > Interestingly: > > WITH vals (v,l) AS ( VALUES (1,'Z'),(3,'D'),(4,'R'),(3,'A'),(2,'T') ) > SELECT string_agg(DISTINCT l, ';' ORDER BY l, ';' DESC) FROM vals; > > ERROR: in an aggregate with DISTINCT, ORDER BY expressions must appear in > argument list > LINE 2: SELECT string_agg(DISTINCT l, ';' ORDER BY l, ';' DESC) FROM... > > But both expressions in the argument list (el and semicolon) do appear in the > ORDER BY... I think ORDER BY has to match DISTINCT columns, while you are using ';'. I used a simpler example with array_agg() in my patch to avoid the issue. -- Bruce Momjian <br...@momjian.us> https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 7c3e940afe..3b49e63987 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -20273,6 +20273,13 @@ SELECT NULLIF(value, '(none)') ... aggregation. </para> + <para> + While all aggregates below accept an optional + <literal>ORDER BY</literal> clause (as outlined in <xref + linkend="syntax-aggregates"/>), the clause has only been added to + aggregates whose output is affected by ordering. + </para> + <table id="functions-aggregate-table"> <title>General-Purpose Aggregate Functions</title> <tgroup cols="2"> @@ -20310,7 +20317,7 @@ SELECT NULLIF(value, '(none)') ... <indexterm> <primary>array_agg</primary> </indexterm> - <function>array_agg</function> ( <type>anynonarray</type> ) + <function>array_agg</function> ( <type>anynonarray</type> <literal>ORDER BY</literal> <literal>input_sort_columns</literal> ) <returnvalue>anyarray</returnvalue> </para> <para> @@ -20321,7 +20328,7 @@ SELECT NULLIF(value, '(none)') ... <row> <entry role="func_table_entry"><para role="func_signature"> - <function>array_agg</function> ( <type>anyarray</type> ) + <function>array_agg</function> ( <type>anyarray</type> <literal>ORDER BY</literal> <literal>input_sort_columns</literal> ) <returnvalue>anyarray</returnvalue> </para> <para> @@ -20526,14 +20533,14 @@ SELECT NULLIF(value, '(none)') ... <indexterm> <primary>json_agg</primary> </indexterm> - <function>json_agg</function> ( <type>anyelement</type> ) + <function>json_agg</function> ( <type>anyelement</type> <literal>ORDER BY</literal> <literal>input_sort_columns</literal> ) <returnvalue>json</returnvalue> </para> <para role="func_signature"> <indexterm> <primary>jsonb_agg</primary> </indexterm> - <function>jsonb_agg</function> ( <type>anyelement</type> ) + <function>jsonb_agg</function> ( <type>anyelement</type> <literal>ORDER BY</literal> <literal>input_sort_columns</literal> ) <returnvalue>jsonb</returnvalue> </para> <para> @@ -20573,7 +20580,8 @@ SELECT NULLIF(value, '(none)') ... </indexterm> <function>json_object_agg</function> ( <parameter>key</parameter> <type>"any"</type>, <parameter>value</parameter> - <type>"any"</type> ) + <type>"any"</type> + <literal>ORDER BY</literal> <literal>input_sort_columns</literal> ) <returnvalue>json</returnvalue> </para> <para role="func_signature"> @@ -20582,7 +20590,8 @@ SELECT NULLIF(value, '(none)') ... </indexterm> <function>jsonb_object_agg</function> ( <parameter>key</parameter> <type>"any"</type>, <parameter>value</parameter> - <type>"any"</type> ) + <type>"any"</type> + <literal>ORDER BY</literal> <literal>input_sort_columns</literal> ) <returnvalue>jsonb</returnvalue> </para> <para> @@ -20819,7 +20828,8 @@ SELECT NULLIF(value, '(none)') ... </para> <para role="func_signature"> <function>string_agg</function> ( <parameter>value</parameter> - <type>bytea</type>, <parameter>delimiter</parameter> <type>bytea</type> ) + <type>bytea</type>, <parameter>delimiter</parameter> <type>bytea</type> + <literal>ORDER BY</literal> <literal>input_sort_columns</literal> ) <returnvalue>bytea</returnvalue> </para> <para> @@ -20877,7 +20887,7 @@ SELECT NULLIF(value, '(none)') ... <indexterm> <primary>xmlagg</primary> </indexterm> - <function>xmlagg</function> ( <type>xml</type> ) + <function>xmlagg</function> ( <type>xml</type> <literal>ORDER BY</literal> <literal>input_sort_columns</literal> ) <returnvalue>xml</returnvalue> </para> <para> diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml index 3ba844057f..c5627001c7 100644 --- a/doc/src/sgml/syntax.sgml +++ b/doc/src/sgml/syntax.sgml @@ -1647,7 +1647,26 @@ sqrt(2) are always just expressions and cannot be output-column names or numbers. For example: <programlisting> -SELECT array_agg(a ORDER BY b DESC) FROM table; +WITH vals (v) AS ( VALUES (1),(3),(4),(3),(2) ) +SELECT array_agg(v ORDER BY v DESC) FROM vals; + array_agg +------------- + {4,3,3,2,1} +</programlisting> + Since <type>jsonb</type> only keeps the last matching key, ordering + of its keys can be significant: +<programlisting> +WITH vals (k, v) AS ( VALUES ('key0','1'), ('key1','3'), ('key1','2') ) +SELECT jsonb_object_agg(k, v) FROM vals; + jsonb_object_agg +---------------------------- + {"key0": "1", "key1": "2"} + +WITH vals (k, v) AS ( VALUES ('key0','1'), ('key1','3'), ('key1','2') ) +SELECT jsonb_object_agg(k, v ORDER BY v) FROM vals; + jsonb_object_agg +---------------------------- + {"key0": "1", "key1": "3"} </programlisting> </para> @@ -1668,11 +1687,17 @@ SELECT string_agg(a ORDER BY a, ',') FROM table; -- incorrect </para> <para> - If <literal>DISTINCT</literal> is specified in addition to an - <replaceable>order_by_clause</replaceable>, then all the <literal>ORDER BY</literal> - expressions must match regular arguments of the aggregate; that is, - you cannot sort on an expression that is not included in the - <literal>DISTINCT</literal> list. + If <literal>DISTINCT</literal> is specified with an + <replaceable>order_by_clause</replaceable>, <literal>ORDER + BY</literal> expressions can only reference columns in the + <literal>DISTINCT</literal> list. For example: +<programlisting> +WITH vals (v1, v2) AS ( VALUES (1,'Z'),(3,'D'),(4,'R'),(3,'A'),(2,'T') ) +SELECT array_agg(DISTINCT v2 ORDER BY v2 DESC) FROM vals; + array_agg +------------- + {Z,T,R,D,A} +</programlisting> </para> <note>