2009/9/23 Corinna Vinschen: > Right now, if you switch the charset via the setlocale function, you > also switch the charset used for console output.
Andy wrote: > That's quite a unique advantage of the Cygwin console actually, > because it means you always get correct output even if you switch > charset on the fly. It might be considered an advantage but the fact that it is unique also means it is absolutely not portable. In the normal Linux/Unix environment, an application that deliberately uses setlocale for a switch must be aware that it does NOT switch the terminal encoding but does this only for the purpose of specific invocations of wide character functions. The same applies to a cygwin application running in mintty, xterm, or urxvt. So in order to take advantage of this "advantage" the application would have to check the environment whether TERM=cygwin - a use case of very limited value, I assume. Also, from Corinna's last statement after the discussion I had raised about "codepage after rlogin" or so, my assumption was that the setting of ${LC_ALL:-${LC_CTYPE:-$LANG}} (in shell syntax) before the first invocation of a cygwin application would determine the console encoding for the whole "cygwin session" (whatever that is, considering that one might invoke CMD.EXE, change LC_ALL, invoke another bash etc.). Considering that a stable solution should be found, and the portability issue, I am not so much in favour of switching terminal encoding on-the-fly. esp. not as a side effect of a function that was not intended this way. In this sense, I also don't think this would be producing "correct output". > A normal terminal, on the other hand, doesn't actually know what > charset the app running inside it is using. Hence, for correct output, > the user has to make sure the terminal and application charsets match, > or use something like 'luit' to translate between them. If I had not split off this quote, my elaboration could have been shorter... Corinna wrote: >ÃThis is done on the > grounds that the console isn't capable to switch the console set by > itself, as it is for terminal emulators like mintty. The problem with > this approach is even documented in setup2.sgml, just commented out. > If you use a tool like ssh to connect to a remote machine, then ssh > uses potentially another locale and charset than the remote shell. I don't understand this completely; I only hope that "local" and "remote" charset remains consistent after this problem had been fixed once, at least if you use a cygwin tool for the remote connection. (If you happen to use a Windows telnet, you will arrive remotely with the native Windows console codepage instead, which is acceptable in the current "hybrid" mode of operation as I described it in another mail.) > ssh is always running in the "C" locale Andy wrote: > Are you sure? Shouldn't it be calling 'setlocale(LC_ALL, "")', thereby > configuring the console output according to the locale variables? The need to add setlocale to a number of tools that don't need it in Linux/Unix because they are simply byte-transparent was discussed before and deemed undesirable if I remember correctly. I'm not sure why this should be needed now again, maybe it's related to the Windows file name system not being byte-transparent? If a solution can be found that avoids this, much trouble would be prevented, I guess. Coming to the initial question whether the Windows console codepage (as affected by chcp) should be used for cygwin, I certainly vote NO; this would be a step back behind 1.5, using the obnoxious "OEM" codepages by default in many cases. My vote might change to "YES" if by adding suitable startup conventions (like putting chcp in cygwin.bat and always spawning off a new Windows console to prevent changing the current CMD.EXE codepage... :( ), it would be assured that the default codepage would always be one of: * CP1252 (like in 1.5) * ISO 8859-1 (like in 1.7) * UTF-8 (as discussed in another mail thread) I guess people migrating from 1.5 could be convinced of a transition to UTF-8 but not of a transition to archaic CP437 or CP850, and teaching them to use "chcp" or "setfont" rather than the locale mechanism would be both cumbersome and incompatible with a Linux/Unix environment. Kind regards, Thomas -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple