Re: gcc can't process some utf-8 characters

Liu Hao via Gcc Wed, 13 Jan 2021 19:27:51 -0800

在 2021/1/14 上午9:47, Roy Qu via Gcc 写道:
> I use "gcc  -finput-charset=utf-8 -fexec-charset=gb2312" to compile utf-8
> encoding source files under  windows. Most of the time it works well, but
> when the source file contains some characters such as "—", gcc will fail
> and the error message is: "[Error] converting to execution character set:
> Illegal byte sequence".
> 
> The attached file is an example. I have tested the file by using iconv to
> convert it from utf-8 to gbk, and iconv works with no complaints.
>


It looks like this is a bug in iconv. Converting the attached source with 
`iconv -f utf-8 -t gb2312
testencoding.cpp` gives the same error.

According to the GB2312 code table [1], the EM DASH symbol (U+2014) should map 
to the double-byte
sequence `A1 AA`. There is no difference among GB2312, GBK and GB18030.

Please consider GB2312 superseded by GBK. The native code page (936) references 
GBK instead of GB2312.


[1] http://www.khngai.com/chinese/charmap/


> So maybe there's something wrong when gcc is trying to do the encoding
> conversion?
> 
> Some information:
> Toolchain: MinGW-W64-i686, gcc 10.2
> System: Windows 10 Simplified Chinese Home edition ver 2004
> 


-- 
Best regards,
LH_Mouse

signature.asc
Description: OpenPGP digital signature

Re: gcc can't process some utf-8 characters

Reply via email to