cygwin + GetConsoleOutputCP

Charles Wilson Sun, 20 Mar 2011 12:14:23 -0700

Question about porting the upstream "dos2unix" utilities.  These
implementations provide capabilities to convert text files from a
certain limited set of INPUT encodings (most are DOS codepages):


=====================================================
CONVERSION MODES
       Conversion modes ascii, 7bit, and iso are
       similar to those of dos2unix/unix2dos under
       SunOS/Solaris.

       ascii
           In mode "ascii" only line breaks are
           converted. This is the default conversion
           mode.

           Although the name of this mode is ASCII,
           which is a 7 bit standard, the actual mode
           is 8 bit. Use always this mode when
           converting Unicode UTF-8 files.

       7bit
           In this mode all 8 bit non-ASCII characters
           (with values from 128 to 255) are converted
           to a 7 bit space.

       iso Characters are converted between a DOS
           character set (code page) and ISO character
           set ISO-8859-1 (Latin-1) on Unix. DOS
           characters without ISO-8859-1 equivalent,
           for which conversion is not possible, are
           converted to a dot. The same counts for
           ISO-8859-1 characters without DOS
           counterpart.

           When only option "-iso" is used dos2unix
           will try to determine the active code page.
           When this is not possible dos2unix will use
           default code page CP437, which is mainly
           used in the USA.  To force a specific code
           page use options "-437" (US), "-850"
           (Western European), "-860" (Portuguese),
           "-863" (French Canadian), or "-865"
           (Nordic).  Windows code page CP1252
           (Western European) is also supported with
           option "-1252". For other code pages use
           dos2unix in combination with iconv(1).
           Iconv can convert between a long list of
           character encodings.
=====================================================

So basically if you specify -iso (or --conv iso) without any of the
"input encoding specification" options like -437 etc, then dos2unix will
autodetect attempt to detect the *console* encoding.  If it succeeds,
then it will "convert" character codes from that encoding to their
equivalent in ISO-8859-1 ("Latin 1") [unconvertible codes are replaced
with an ascii dot]

Note that this autodetect, if it works, assumes that the console's CP is
the input file's CP.  Fair enough -- and it's an overridable default
anyway.  However, I wonder if, in cygwin-1.7, we actually can/should use
the "console codepage" in ANY way.  Here's the code:

querycp.c:
#elif defined (WIN32) || defined(__CYGWIN__)

/* Erwin Waterlander */

#include <windows.h>
unsigned short query_con_codepage(void) {
   return((unsigned short)GetConsoleOutputCP());
}
#else

Or if instead, on cygwin, we should use some other mechanism (locale
settings?) to determine the correct default "input" codepage.

Comments?

--
Chuck




--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

cygwin + GetConsoleOutputCP

Reply via email to