On Jun 23 16:06, Corinna Vinschen wrote: > On Jun 23 15:45, Thomas Wolff wrote: > > Corinna Vinschen wrote: > > > On Jun 22 16:48, Thomas Wolff wrote: > > > > Since the latest locale-related changes, the default codepage after > > > > starting cygwin _without_ explicit setting (of a locale variable) > > > > seems to have changed from CP1252 ("Windows ANSI") to ISO 8859-1 > > > > ("Latin 1"). > > > > Was this change on purpose? > > > > > > There was no such change at all. The default codepage is still the > > > default ANSI codepage on your system. The internal conversion from > > > Windows functions to the POSIX multibyte environment and vice versa > > > uses UTF-8, though, so that all existing filenames have a valid > > > representation even when using characters not available in your > > > current codepage. > > If I do the following: > > * Open cmd console window. > > * Go into cygwin 1.7 directory. > > * Call cygwin.bat. > > * In cygwin, "cat" a file with all 8 bit characters from U+20 to U+FF. > > Then there are no printable characters in the range U+80...U+9F > > (the difference between ISO 8859-1 and Windows "Western" CP1252). > > > No. The difference between UTF-8 and CP1252. 0x80-0x9f are not > valid codepoints in UTF-8 and the Cygwin console is using UTF-8 by > default as well.
Hang on, I'm talking nonsense. The console does not use UTF-8 by default, rather it just uses ASCII. I tested this myself and now I understand what you mean. The console seems to use ISO-8859-1, but actually it doesn't. What happens is this: The console I/O functions are using UTF-16 under the hood, so each incoming character is converted to Unicode. The ASCII->Unicode conversion treats all incoming bytes literally. Since the Unicode values from 0x80 to 0xff are derived from the ISO-8859-1 table, you actually see ISO-8859-1 by default on the console. So here's the question: Why is that a problem? It's just the default output. I *can't* use CP1252 as default, because it's only a valid default on western language versions of Windows. Rather I would have to use the defualt ANSI codepage, whatever that is on the machine. ISO-8859-1 OTOH is the least intrusive default since it allows a representation on all machines, independent of their default ANSI codepage. Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple