Re: EBCDIC sorting as a use case for ICU rules

Peter Eisentraut Thu, 06 Jul 2023 02:32:51 -0700

On 21.06.23 15:28, Daniel Verite wrote:

A collation like the following this seems to work (the rule simply enumerates
US-ASCII letters in the EBCDIC alphabet order, with adequate quoting)


CREATE COLLATION ebcdic (provider='icu', locale='und',
rules=$$&'
'<'.'<'<'<'('<'+'<\|<'&'<'!'<'$'<'*'<')'<';'<'-'<'/'<','<'%'<'_'<'>'<'?'<'`'<':'<'#'<'@'<\'<'='<'"'<a<b<c<d<e<f<g<h<i<j<k<l<m<n<o<p<q<r<'~'<s<t<u<v<w<x<y<z<'['<'^'<']'<'{'<A<B<C<D<E<F<G<H<I<'}'<J<K<L<M<N<O<P<Q<R<'\'<S<T<U<V<W<X<Y<Z<0<1<2<3<4<5<6<7<8<9$$);

This can be useful for people who migrate from mainframes to Postgres
and need their migration tests to produce the same sorted results as the
original system.
Since rules can be defined at the database level with the icu_rules option,
they don't even need to tweak their queries to add COLLATE clauses,
which surely is appreciable in that kind of project.

US-ASCII when sorted in EBCDIC order comes out like this:

.<(+|&!$*);-/,%_>?`:#@'="abcdefghijklmnopqr~stuvwxyz[^]{ABCDEFGHI}JKLMNOPQR\ST
UVWXYZ0123456789

Maybe this example could be added to the documentation except for
the problem that the rule is very long and dollar-quoting cannot be split
into several lines. Literals enclosed by single quotes can be split that
way, but would require escaping the single quotes in the rule, which
would lead to scary-looking over-quoted contents.


You can use whitespace in the rules.  For example,

CREATE COLLATION ebcdic (provider='icu', locale='und',
rules=$$
& ' ' < '.' < '<' < '(' < '+' < \|
< '&' < '!' < '$' < '*' < ')' < ';'
< '-' < '/' < ',' < '%' < '_' < '>' < '?'
< '`' < ':' < '#' < '@' < \' < '=' < '"'
< a < b < c < d < e < f < g < h < i
< j < k < l < m < n < o < p < q < r
< '~' < s < t < u < v < w < x < y < z
< '[' < '^' < ']'
< '{' < A < B < C < D < E < F < G < H < I
< '}' < J < K < L < M < N < O < P < Q < R
< '\'  < S < T < U < V < W < X < Y < Z
< 0 < 1 < 2 < 3 < 4 < 5 < 6 < 7 < 8 < 9
$$);

(This particular layout is meant to match the rows in
https://en.wikipedia.org/wiki/EBCDIC#Code_page_layout.)

Re: EBCDIC sorting as a use case for ICU rules

Reply via email to