Tim Waugh wrote: > On Tue, 2007-01-23 at 17:17 +0100, Andreas Schwab wrote: >> glibc definitely uses strcoll as well. Most likely python has its own >> implementation which gets it wrong. > > No, really, this is going through glibc's __collseq_table_lookup > function. The Python example is just an easy-to-run distilled test > case.
But it doesn't matter what undocumented internal function glibc is using. The portable, standard way to perform character comparison using the current locale is strcoll(). If I can't get the same results using strcoll(), glibc is clearly doing something different internally. (And there is no portable standard way to obtain the current collating sequence. The best you can do is sort sets of characters like I did.) Try running the attached program. Run it like rangecmp -v start test end e.g., rangecmp -v A h Z Here are the results I get: $ LC_ALL=C ./rangecmp -v A h Z default locale = C strcoll (h, A) -> 1 strcoll (h, Z) -> 1 $ ./rangecmp -v A h Z default locale = en_US.UTF-8 strcoll (h, A) -> 7 strcoll (h, Z) -> -18 $ LC_ALL=en_US ./rangecmp -v A h Z default locale = en_US strcoll (h, A) -> 7 strcoll (h, Z) -> -18 strcoll indicates that, in the "en_US" locale, `h' sorts between `A' and `Z'. In the "C" locale, it does not. This is consistent with the collating sequences I posted earlier. Chet -- ``The lyf so short, the craft so long to lerne.'' - Chaucer Live Strong. No day but today. Chet Ramey, ITS, CWRU [EMAIL PROTECTED] http://cnswww.cns.cwru.edu/~chet/
#include <stdio.h> #include <locale.h> #include <string.h> #include <stdlib.h> #include <unistd.h> static void usage() { fprintf(stderr, "rangecmp: usage: rangecmp [-v] start test end\n"); } int main(c, v) int c; char **v; { int i, verbose, r1, r2; char *dlocale; verbose = 0; while ((i = getopt(c, v, "v")) != -1) { switch (i) { case 'v': verbose = 1; break; case '?': default: usage(); exit(2); } } c -= optind; v += optind; dlocale = setlocale(LC_ALL, ""); if (verbose) printf("default locale = %s\n", dlocale ? dlocale : "''"); r1 = strcoll (v[1], v[0]); printf("strcoll (%s, %s) -> %d\n", v[1], v[0], r1); r2 = strcoll (v[1], v[2]); printf("strcoll (%s, %s) -> %d\n", v[1], v[2], r2); exit(0); }
_______________________________________________ Bug-bash mailing list Bug-bash@gnu.org http://lists.gnu.org/mailman/listinfo/bug-bash