2010/1/24 Corinna Vinschen: >> Something's going seriously wrong with this, and I'd suspect it's to >> do with turning backslashes into yen symbols. > > Right. It occured to me tonight that this will not work from a > filesystem point-of-view. The people who decided to overload backslash > and tilde in the ASCII range with different symbols in SJIS still need > some serious knock on their heads. No wonder the Microsoft guys kept > the binary values of characters intact, especially due to the backslash > problem.
I looked into this a bit more, out of morbid curiosity. Actually it's Microsoft themselves (or IBM?) who have to take a large part of the blame here, for deciding to use the backslash as the DOS directory separator. ISO-646, which is an internationalized version of ASCII, defines the backslash codepoint as 'localizable', and many national variants of it do define it as something else. (See http://en.wikipedia.org/wiki/ISO/IEC_646) To work around this issue in the case of SJIS, MS decided to stick with the backslash for CP932, and instead implemented a nasty hack to achieve some sort of SJIS compatibility: Japanese Windows fonts, including Unicode fonts, have a Yen symbol at the backspace position. > In theory, we could be able to keep SJIS support in. The > Cygwin-internal function converting multibyte strings to Unicode > filenames would have to use CP932. Only on the application level the > conversion would use SJIS. I've pondered that, and I don't think that's worthwhile. It's still going to cause trouble, e.g. with the backslash's use as an escape character and the tilde's use in shell expansions. Also, there are some more differences between standard SJIS and CP932 (although none as serious as the backslash and tilde issues), so more work would be needed to get that right. Finally, CP932 is the only "SJIS" that people are realistically going to care about, since that's what's in widespread use due to Windows. If someone really needs standard SJIS for converting documents or something, they can use iconv. Therefore I've changed my mind on whether to keep SJIS and CP932 separate: I think we should stick with the <locale>.SJIS charset as it is in 1.7.1, except that nl_langinfo(CODESET) for it should return "CP932" instead of "SJIS", to make sure iconv uses the right charset, thereby addressing the OP's issue. Andy -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple