Hello,
I would like to compile files created on Windows and encoded in
"Unicode" (UTF-8 or UTF-16). Microsoft puts a little header at the
beginning of files to indicate that they are UTF-16, UTF-8, etc. I
believe that this header is standard unicode btw, not an extension!
When I try to compile the most simple hello world file (included at the
end of this message) with gcc that was created in notepad and saved in
Unicode, I get the following message:
saved in UTF-8:
nicolas:~> gcc -finput-charset=UTF-8 hi-utf8.c
hi-utf8.c:1: error: stray '\239' in program
hi-utf8.c:1: error: stray '\187' in program
hi-utf8.c:1: error: stray '\191' in program
hi-utf8.c:1: error: syntax error at '#' token
hi-utf8.c:1: error: parse error before '<' token
saved in UTF-16:
nicolas:~> gcc -finput-charset=UTF-16 hi-utf16.c
hi-utf16.c:1:19: failure to convert UTF-16 to UTF-8
without specifying the codepage
nicolas:~> gcc hi-utf16.c
a ton of errors
From these results, I am lead to believe, perhaps wrongly, that there
are 2 problems:
1-CPP doesn't recognize the unicode header
2-CPP reads the source file as UTF-16, but fails to read the header file
as UTF-8, or ascii.
Presumably, CPP converts source files into UTF-8 for GCC.
I think that CPP should try to determine the encoding for each file and
not use a single encoding for every file. It should look for a unicode
header when it opens a file (original c source or any include), and if
it doesn't find one, use the default: -finput-charset, LC_CTYPE, UTF-8,
until it's done processing that file. Note that vim is reads files
saved with unicode headers without problem.
I am using cpp 3.4.3, which may be old... but I cant find what's new in
the 4 branch.
Please inform me if I am missing something, or what can be done about
this. Thank you.
Nicolas
source:
#include <stdio.h>
int main()
{
printf("hi.\n");
return 0;
}