Re: Is it possible to sort strings in EBCDIC order in PostgreSQL server?

James Keener Tue, 12 Dec 2017 07:11:58 -0800

>
> The default C locale on Linux (I don't know Windows) will sort "digits",
> then alphabetic with the lower then upper case of each letter in order
> like: "aAbB...zZ"
>


That's no true at all! The C locales are 0-9A-Za-z

#include <locale.h>
> #include <stddef.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
>
>
> static int myCompare (const void * a, const void * b)
> {
>   return strcmp (*(const char **) a, *(const char **) b);
> }
>
> void with_other_locale (char *new_locale,
>     const char **arr,
>     const int n
>     )
> {
>   char *old_locale, *saved_locale;
>
>   /* Get the name of the current locale.  */
>   old_locale = setlocale (LC_ALL, NULL);
>
>   /* Copy the name so it won’t be clobbered by setlocale. */
>   saved_locale = strdup (old_locale);
>
>   if (strlen(new_locale) == 0)
>   {
>     new_locale = saved_locale;
>   }
>
>   /* Now change the locale and do some stuff with it. */
>   setlocale (LC_ALL, new_locale);
>   qsort (arr, n, sizeof (const char *), myCompare);
>
>   printf("\nSorted array in locale %s is\n", new_locale);
>   for (int i = 0; i < n; i++)
>     printf("%d: %s \n", i, arr[i]);
>
>   /* Restore the original locale. */
>   setlocale (LC_ALL, saved_locale);
>   free (saved_locale);
> }
>
> int main ()
> {
>   const char *arr[] = {"Jim", "job", "Anne", "aardvark", "Isaac",
> "island", "12 Days of Christmas", "12 drummers"};
>   int n = sizeof(arr)/sizeof(arr[0]);
>   int i;
>
>   printf("Given array is\n");
>   for (i = 0; i < n; i++)
>     printf("%d: %s \n", i, arr[i]);
>
>   with_other_locale("", arr, n);
>   with_other_locale("C", arr, n);
>   with_other_locale("en_US.UTF-8", arr, n);
>   with_other_locale("UTF-8", arr, n);
>   return 0;
> }
>

Gives

Given array is
0: Jim
1: job
2: Anne
3: aardvark
4: Isaac
5: island
6: 12 Days of Christmas
7: 12 drummers

Sorted array in locale C is
0: 12 Days of Christmas
1: 12 drummers
2: Anne
3: Isaac
4: Jim
5: aardvark
6: island
7: job

Sorted array in locale C is
0: 12 Days of Christmas
1: 12 drummers
2: Anne
3: Isaac
4: Jim
5: aardvark
6: island
7: job

Sorted array in locale en_US.UTF-8 is
0: 12 Days of Christmas
1: 12 drummers
2: Anne
3: Isaac
4: Jim
5: aardvark
6: island
7: job

Sorted array in locale UTF-8 is
0: 12 Days of Christmas
1: 12 drummers
2: Anne
3: Isaac
4: Jim
5: aardvark
6: island
7: job

I actually don't think there is a case-insensitive locale (bydefault?) on a
unix machine.

Jim

On Tue, Dec 12, 2017 at 8:18 AM, John McKown <john.archie.mck...@gmail.com>
wrote:

> On Tue, Dec 12, 2017 at 2:17 AM, Tsunakawa, Takayuki <
> tsunakawa.ta...@jp.fujitsu.com> wrote:
>
>> Hi Laurenz, Tom, Peter,
>>
>> Thanks for your suggestions.  The practical solution seems to be to
>> override comparison operators of char, varchar and text data types with
>> UDFs that behave as Tom mentioned.
>>
>> From: Peter Geoghegan [mailto:p...@bowt.ie]
>> > That said, the idea of an "EBCDIC collation" seems limiting. Why
>> > should a system like DB2 for the mainframe (that happens to use EBCDIC
>> > as its encoding) not have a more natural, human-orientated collation
>> > even while using EBCDIC? ISTM that the point of using the "C" locale
>> > (with EBDIC or with UTF-8 or with any other encoding) is to get a
>> > performance benefit where the actual collation's behavior doesn't
>> > matter much to users. Are you sure it's really important to be
>> > *exactly* compatible with EBCDIC order? As long as you're paying for a
>> > custom collation, why not just use a collation that is helpful to
>> > humans?
>>
>> You are right.  I'd like to ask the customer whether and why they need
>> EBCDIC ordering.
>>
>
> This is a guess on my part, based on many years on an EBCDIC system. But
> I'll bet that they are doing a conversion off of the EBCDIC system (maybe
> Db2 on z/OS) to an ASCII system (Linux or Windows) running PostgreSQL. They
> want to be able to compare the output from the existing system to the
> output on the new system. EBCDIC orders "lower case", "upper case", then
> "digits". The default C locale on Linux (I don't know Windows) will sort
> "digits", then alphabetic with the lower then upper case of each letter in
> order like: "aAbB...zZ". Comparing identical data which is not presented in
> exactly the same order would be very difficult. 
>
>
>
>>
>> Regards
>> Takayuki Tsunakawa
>>
>>
>
>
> --
> I have a theory that it's impossible to prove anything, but I can't prove
> it.
>
> Maranatha! <><
> John McKown
>

Re: Is it possible to sort strings in EBCDIC order in PostgreSQL server?

Reply via email to