Ronald Fischer wrote: > Maybe someone could enlighten me about the following: > ... > That means, the German letter ü has encoding 0xFC. If I do the same on CMD > shell > (the 'od' used here comes from the Gnu Utilities for Windows), I see: > ... > That is, ü is encoded as 0x81. Why is this different?
> I am aware that, for historic reason, different encodings exist (the old > DOS encoding, Windows ANSI encoding etc.). So you answered your question yourself :) > I wouldn't have expected those > differences, however, when comparing bash.exe vs. cmd.exe. The encoding is applied by the terminal, not the application. For bash, the letter ü is only a sequence of one or two bytes, while the terminal decides which bytes your keyboard sends to the application when you enter ü, and what to display when your program outputs those bytes (i.e., traditionally, while in the age of locales things may sometimes get more complicated :( ). Having said this, I also need to adjust the following response: Matthias Andree wrote: > Because the code pages differ. 0xFC is ISO-8859-1 ("Latin 1") or -15 ("Latin > 9") > or CP1252/Windows-1252 (Latin 1 Extended; the latter allocates 0x80...0x9f > differently than ISO-8859-1) and CMD uses CP437 or CP850. This is not really correct; like bash, CMD does not use a codepage itself. If you start CMD from Windows, it will implicitly be embedded in a Windows console which uses CP437 (American), CP850 (Western European) or some other default of your system configuration. However, you could also run CMD from a cygwin bash. In this case, maximising the confusion, there are two different situations: * Run mintty, start CMD from bash there: CMD will see the same codepage as bash since it is the one configured for mintty. So echo ü would produce 0xFC even in CMD (assuming mintty runs one of the codepages which map ü to 0xFC). * Run cygwin console, observe this: Since the cygwin console is a hybrid as the encoding is emulated by the cygwin dll within a Windows console, unlike all other terminals, the effective "codepage" varies with the application: A cygwin application will use the encoding configured for the cygwin session, while any non-cygwin application will use the native Windows console codepage. So you may echo ü from bash, then start CMD from there, echo ü again, and will get different codes for the same key! Kind regards, Thomas -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple