On Mar 17, 2006, at 4:32 PM, Thomas Lumley wrote: > The following caused a hard-to-diagnose problem for a user of the > survey package. Presumably this is a strange Unicode thing,
It is independent of the encoding: [EMAIL PROTECTED]:~$ LC_COLLATE=en_US R --vanilla -q<tr > "1//"<"10/" [1] TRUE > "1//2"<"10/2" [1] FALSE > Sys.getlocale("LC_COLLATE") [1] "en_US" (en_US is ISO-8859-1 on that machine) And systems don't seem to agree on anything but C locale: Mac OS X: caladan:urbanek$ LC_COLLATE=en_US R --vanilla -q<tr > "1//"<"10/" [1] TRUE > "1//2"<"10/2" [1] TRUE > Sys.getlocale("LC_COLLATE") [1] "en_US" IRIX: fry:urbanek$ LC_COLLATE=en_US R --vanilla -q<tr > "1//"<"10/" [1] FALSE > "1//2"<"10/2" [1] FALSE > Sys.getlocale("LC_COLLATE") [1] "en_US" But at least most systems are consistent in terms of adding a character, except for GNU/Linux. Looking at the locale definitions, GNU/Linux uses "iso14651_t1" template for many languages. Maybe the problem is that "/" is defined in the "SPECIAL" section of the ISO-14651 template, which possibly causes / to be completely ignored in the "LATIN" part, which would explain the behavior (("1"<"10")==TRUE, ("12"<"102")==FALSE). I couldn't find anything on what the "offical" en_** collating should be so I have no idea whether this is a bug in the GNU/Linux locales or not... Cheers, Simon ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel