Hello,

I would like to compile files created on Windows and encoded in "Unicode" (UTF-8 or UTF-16). Microsoft puts a little header at the beginning of files to indicate that they are UTF-16, UTF-8, etc. I believe that this header is standard unicode btw, not an extension!

When I try to compile the most simple hello world file (included at the end of this message) with gcc that was created in notepad and saved in Unicode, I get the following message:

saved in UTF-8:

 nicolas:~> gcc -finput-charset=UTF-8 hi-utf8.c
hi-utf8.c:1: error: stray '\239' in program
hi-utf8.c:1: error: stray '\187' in program
hi-utf8.c:1: error: stray '\191' in program
hi-utf8.c:1: error: syntax error at '#' token
hi-utf8.c:1: error: parse error before '<' token

saved in UTF-16:

 nicolas:~> gcc -finput-charset=UTF-16 hi-utf16.c
hi-utf16.c:1:19: failure to convert UTF-16 to UTF-8

without specifying the codepage
 nicolas:~> gcc hi-utf16.c
a ton of errors


From these results, I am lead to believe, perhaps wrongly, that there are 2 problems:

1-CPP doesn't recognize the unicode header
2-CPP reads the source file as UTF-16, but fails to read the header file as UTF-8, or ascii.

Presumably, CPP converts source files into UTF-8 for GCC.

I think that CPP should try to determine the encoding for each file and not use a single encoding for every file. It should look for a unicode header when it opens a file (original c source or any include), and if it doesn't find one, use the default: -finput-charset, LC_CTYPE, UTF-8, until it's done processing that file. Note that vim is reads files saved with unicode headers without problem.

I am using cpp 3.4.3, which may be old... but I cant find what's new in the 4 branch.


Please inform me if I am missing something, or what can be done about this. Thank you.


Nicolas


source:


#include <stdio.h>

int main()
{
  printf("hi.\n");

  return 0;
}


Reply via email to