On 21 March 2011 11:17, Corinna Vinschen wrote: > On Mar 21 07:53, Andy Koppe wrote: >> On 20 March 2011 19:13, Charles Wilson wrote: >> > So basically if you specify -iso (or --conv iso) without any of the >> > "input encoding specification" options like -437 etc, then dos2unix will >> > autodetect attempt to detect the *console* encoding. If it succeeds, >> > then it will "convert" character codes from that encoding to their >> > equivalent in ISO-8859-1 ("Latin 1") [unconvertible codes are replaced >> > with an ascii dot] >> > >> > Note that this autodetect, if it works, assumes that the console's CP is >> > the input file's CP. Fair enough -- and it's an overridable default >> > anyway. However, I wonder if, in cygwin-1.7, we actually can/should use >> > the "console codepage" in ANY way. Here's the code: >> > >> > querycp.c: >> > #elif defined (WIN32) || defined(__CYGWIN__) >> > >> > /* Erwin Waterlander */ >> > >> > #include <windows.h> >> > unsigned short query_con_codepage(void) { >> > return((unsigned short)GetConsoleOutputCP()); >> > } >> > #else >> > >> > Or if instead, on cygwin, we should use some other mechanism (locale >> > settings?) to determine the correct default "input" codepage. >> >> I think defaulting to the console codepage makes sense for the DOS >> side of the conversion. Having said that, Windows files that aren't >> "Unicode", i.e. UTF-16, are usually encoded in the so-called ANSI >> codepage, e.g. CP1252, so it would make more sense to default to that. > > I agree with Andy here. I don't think there are really a lot of files > left today, which are encoded using the old DOS codepages. > >> However, the real problem with this feature is that the Unix side of >> the conversion is fixed to ISO-8859-1, which makes it near-useless >> when Cygwin defaults to UTF-8. And it's no use for non-Western >> European languages in any case. > > Right again. And not only Cygwin, almost all modern UNIX systems are > using UTF-8 now. The -iso option just doesn't make sense. > >> A worthwhile conversion feature would use >> MultiByteToWideChar()/WideCharToMultiByte() defaulting to the system's >> ANSI codepage on the DOS side, and mbstowcs()/wcstombs() defaulting to > > Well, I'm not sure about that. The complexity of codepage settings on a > Windows system makes the whole afair a guesswork which will always tend > to do the wrong thing anyway. There are the following codepages available: > > - The current input console codepage, GetConsoleCP(). > > - The current output console codepage, GetConsoleOutputCP(). > > - The current OEM codepage, GetOEMCP(). > > - The current ANSI codepage, GetACP(). > > - The default OEM codepage of the default system locale, > GetLocaleInfo (LOCALE_SYSTEM_DEFAULT, LOCALE_IDEFAULTCODEPAGE, ...). > > - The default ANSI codepage of the default system locale, > GetLocaleInfo (LOCALE_SYSTEM_DEFAULT, LOCALE_IDEFAULTANSICODEPAGE, ...). > > - The default OEM codepage of the current user or process, > GetLocaleInfo (LOCALE_USER_DEFAULT, LOCALE_IDEFAULTCODEPAGE, ...). > > - The default ANSI codepage of the current user or process, > GetLocaleInfo (LOCALE_USER_DEFAULT, LOCALE_IDEFAULTANSICODEPAGE, ...). > > - The default OEM codepage used for system invariant operations, > GetLocaleInfo (LOCALE_INVARIANT, LOCALE_IDEFAULTCODEPAGE, ...). > > - The default ANSI codepage used for system invariant operations, > GetLocaleInfo (LOCALE_INVARIANT, LOCALE_IDEFAULTANSICODEPAGE, ...). > > Which is the right one?
GetACP(), which "retrieves the current Windows ANSI code page identifier for the operating system". That's what programs using the non-Unicode APIs get. It's also the default in Notepad and other editors. Other code pages would need to be specified explicitly by the user. > In theory the option is not useful and should just go away. If you > have to keep it for backward compatibility, stick to the current > behaviour and outlaw its use, perhaps be printing a nagging warning > to stderr. ... and pointing them at iconv (which, to be fair, the -iso description already does). Andy -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple