Hi Michael, On Fri, Nov 8, 2024 at 8:52 AM Michael Paquier <mich...@paquier.xyz> wrote: > > On Thu, Nov 07, 2024 at 10:06:04PM +0800, Junwang Zhao wrote: > > I'm afraid this can not be achieved in my current implementation, a simple > > case is: > > > > SELECT array_sort('{foo,bar,null,CCC,Abc,bbc}'::text[]); > > {Abc,bar,bbc,CCC,foo,NULL} > > SELECT array_sort('{foo,bar,null,CCC,Abc,bbc}'::text[] COLLATE "C"); > > {Abc,CCC,bar,bbc,foo,NULL} > > > > SELECT array_sort(a COLLATE "C") FROM (VALUES > > ('{foo,bar,null,CCC,Abc,bbc}'::text[] COLLATE "C"), > > ('{foo,bar,null,CCC,Abc,bbc}'::text[])) v(a); > > {Abc,CCC,bar,bbc,foo,NULL} > > {Abc,CCC,bar,bbc,foo,NULL} > > > > Maybe add some documents to specify this? > > So, if I use that: > CREATE COLLATION case_sensitive (provider = icu, locale = ''); > =# SELECT array_sort('{Abc,CCC,bar,bbc,foo,NULL}'::text[] > COLLATE "case_sensitive"); > array_sort > ---------------------------- > {Abc,bar,bbc,CCC,foo,NULL} > (1 row) > =# SELECT array_sort('{Abc,CCC,bar,bbc,foo,NULL}'::text[] > COLLATE "C"); > array_sort > ---------------------------- > {Abc,CCC,bar,bbc,foo,NULL} > (1 row) > > What takes priority is the collation defined with the array_sort, > which is fine: > =# SELECT array_sort(a collate "case_sensitive") FROM > (VALUES ('{foo,bar,null,CCC,Abc,bbc}'::text[]), > ('{foo,bar,null,CCC,Abc,bbc}'::text[] COLLATE "C" )) v(a); > array_sort > ---------------------------- > {Abc,bar,bbc,CCC,foo,NULL} > {Abc,bar,bbc,CCC,foo,NULL} > (2 rows) > =# SELECT array_sort(a collate "C") FROM > (VALUES ('{foo,bar,null,CCC,Abc,bbc}'::text[]), > ('{foo,bar,null,CCC,Abc,bbc}'::text[] COLLATE "case_sensitive" > )) v(a); > array_sort > ---------------------------- > {Abc,CCC,bar,bbc,foo,NULL} > {Abc,CCC,bar,bbc,foo,NULL} > (2 rows) > > The case where the collation is defined in the set of values is a bit > more troubling to me, as it depends on what the values want to be > applied, still that's OK because the collation applied is the one > coming from the set of values: > =# SELECT array_sort(a) FROM > (VALUES ('{foo,bar,null,CCC,Abc,bbc}'::text[]), > ('{foo,bar,null,CCC,Abc,bbc}'::text[] COLLATE "case_sensitive" > )) v(a); > array_sort > ---------------------------- > {Abc,bar,bbc,CCC,foo,NULL} > {Abc,bar,bbc,CCC,foo,NULL} > (2 rows) > =# SELECT array_sort(a) FROM > (VALUES ('{foo,bar,null,CCC,Abc,bbc}'::text[]), > ('{foo,bar,null,CCC,Abc,bbc}'::text[] COLLATE "C" )) v(a); > array_sort > ---------------------------- > {Abc,CCC,bar,bbc,foo,NULL} > {Abc,CCC,bar,bbc,foo,NULL} > (2 rows) > > I am wondering if there are more fancy cases where the saved cache > could force a state that would lead to puzzling results, say with > different collations that should be applied. I'd recommend to > research that more, to reflect that in the docs and to add tests that > show what we should expect in these cases within 0001 because this > new function is mimicking in the context of a function execution > multiple query clauses where restrictions are applied when analyzing > the query, close to the parser. > > For example, UNION and UNION ALL require a common collation when > processing a set of expressions related to them, which would be OK. > Perhaps I lack some imagination to be able to break things. > -- > Michael
While trying to come up with more test cases, it comes to me if the PG_GET_COLLATION() has already done the work to give array_sort the right collation oid? I did not pass the typentry->typcollation but PG_GET_COLLATION() to tuplesort_begin_datum. I tried: CREATE COLLATION case_sensitive (provider = icu, locale = ''); create table t1(a int, b text[] COLLATE "C"); create table t2(a int, b text[] COLLATE "case_sensitive"); insert into t1 values (1, '{foo,bar,null,CCC,Abc,bbc}'::text[]); insert into t2 values (2, '{foo,bar,null,CCC,Abc,bbc}'::text[]); select array_sort(b) from t1; select array_sort(b) from t2; Set breakpoint to see the collation oid, they all give the expected result. For the following cases: SELECT array_sort(a) FROM (VALUES ('{foo,bar,null,CCC,Abc,bbc}'::text[]), ('{foo,bar,null,CCC,Abc,bbc}'::text[] COLLATE "case_sensitive" )) v(a); WITH t AS (select '{foo,bar,null,CCC,Abc,bbc}'::text[] a UNION ALL select '{foo,bar,null,CCC,Abc,bbc}'::text[] COLLATE "case_sensitive" a) SELECT array_sort(a) from t; The collation seems to have been decided in select_common_collation of the transform phase. For: WITH t AS (select '{foo,bar,null,CCC,Abc,bbc}'::text[] COLLATE "C" a UNION ALL select '{foo,bar,null,CCC,Abc,bbc}'::text[] COLLATE "case_sensitive"a) SELECT array_sort(a) from t; ERROR: collation mismatch between explicit collations "C" and "case_sensitive" LINE 2: '{foo,bar,null,CCC,Abc,bbc}'::text[] COLLATE "case_sens... merge_collation_state gives out an ERROR since its explicit collation. But for implicit collation, select_common_collation sets InvalidOid to context.collation, so the following works: select b from t1 UNION ALL select b from t2; But since the context has the InvalidOid as collation, PG_GET_COLLATION() in arrary_sort got InvalidOid, the following errors: WITH t3 AS (select b from t1 UNION ALL select b from t2) select array_sort(b) from t3; ERROR: could not determine which collation to use for string comparison HINT: Use the COLLATE clause to set the collation explicitly. The error message comes from tuplesort_begin_datum's call stack, we can do explicit COLLATE to make it work: WITH t3 AS (select b from t1 UNION ALL select b from t2) select array_sort(b collate "C") from t3; Based on the above analysis, I think it's ok to pass PG_GET_COLLATION() to tuplesort_begin_datum. PFA v14 with Robert's comment addressed. -- Regards Junwang Zhao
v14-0001-general-purpose-array_sort.patch
Description: Binary data
v14-0002-support-sort-order-and-nullsfirst-flag.patch
Description: Binary data