Glibc cannot deal with EBCDIC or any other charset besides UTF-8 since GCC 
itself does not emit exec-coding set to C library and C library just could not 
deal with it.

Even glibc could deal with it, GCC allows different exec-charset to be linked 
with each other which is definitely an undefined behavior since linker just 
ignores the whole stuff and does not know what are the differences between 
different exec-charset.

The entire C standard library is just designed poorly because of exec-charset 
can be anything and forced locale which violates zero-overhead principle.

I hope GCC could add a macro __GNUC_EXEC_CHARSET__ and 
__GNUC_WIDE_EXEC_CHARSET__ for example to tell the program the name of current 
exec-charset. I need that to make my program runs correctly under different 
exec-charset since glibc does the wrong thing which I have to avoid.



Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10

From: Martin Sebor<mailto:mse...@gmail.com>
Sent: Wednesday, November 25, 2020 19:47
To: Zack Weinberg<mailto:za...@panix.com>; 
gcc@gcc.gnu.org<mailto:gcc@gcc.gnu.org>
Cc: euloa...@live.com<mailto:euloa...@live.com>
Subject: Re: PETITION TO REMOVE -fexec-charset in GCC. That is purely garbage 
and undefined behavior.

On 11/25/20 8:15 AM, Zack Weinberg wrote:
>> printf(“Hello World\n”); is UB under -fexec-charset= EBCDIC. WTF WTF!!!
>
> It's not undefined behavior.  It does, however, appear to trip various
> bugs in GCC.
>
> $ cat test.c
> #include <stdio.h>
> int main(void) { printf("hello world\n"); }
>
> $ gcc-9 --version | head -n1
> gcc-9 (Debian 9.3.0-18) 9.3.0
> $ gcc-9 -fexec-charset=EBCDIC-US test.c
> during GIMPLE pass: printf-return-value
> test.c: In function ‘main’:
> test.c:2: internal compiler error: converting to execution character
> set: Invalid or incomplete multibyte or wide character
>      2 | int main(void) { printf("hello world\n"); }
>
> $ gcc-10 --version | head -n1
> gcc (Debian 10.2.0-18) 10.2.0
>
> $ gcc-10 -fexec-charset=EBCDIC=US -O2 test.c
> during GIMPLE pass: strlen
> test.c: In function ‘main’:
> test.c:2: internal compiler error: converting to execution character
> set: Invalid or incomplete multibyte or wide character
>      2 | int main(void) { printf("hello world\n"); }
>
> But if you manage to avoid all the bugs, it works the way it's supposed to:
>
> $ gcc-10 -fexec-charset=EBCDIC-US -O0 test.c
> $ ./a.out | iconv -f EBCDIC-US -t UTF-8
> hello world
>
> "Internal compiler error" means "there is a bug in the compiler".  It
> is not the same as "undefined behavior," which means something more
> like "there is a bug in your code that the compiler is not obliged to
> diagnose."

I suspect this is due to the same problem as:
   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82700

The EBCDIC-US charset doesn't define all the characters GCC
expects (the bug above says it's missing the opemn left bracket
'[') and the GCC charset APIs don't make it possible to diagnose
this condition in a friendlier way.  (I mentioned this in response
to the duplicate bug:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97620)

The printf pass that fails with this error may not actually need
the left bracket so if that's the only one the conversion fails
for we could work around it by skipping it.  But if the left
bracket appears in the format string GCC will fail to translate
it and give another error (not an ICE, but still a hard error):

$ cat a.c && gcc -Wall -fexec-charset=EBCDIC-US a.c
int main(void) { __builtin_printf("hello [ world\n"); }
a.c: In function ‘main’:
a.c:1:52: error: converting to execution character set: Invalid or
incomplete multibyte or wide character
     1 | int main(void) { __builtin_printf("hello [ world\n"); }
       |                                                    ^
a.c:1:35: warning: zero-length gnu_printf format string
[-Wformat-zero-length]
     1 | int main(void) { __builtin_printf("hello [ world\n"); }
       |                                   ^~~~~~~~~~~~~~~~~

It seems to me the EBCDIC-US charset needs to get fixed (i.e.,
Glibc).

Martin

>
> If this is not the problem you encountered, please describe in
> excruciating detail what your problem actually was.
>
> zw
>
> p.s. I agree with you that the C "locale" mechanism and the C
> standard's concept of "execution character set" are poorly designed
> and one is usually better off writing code that avoids depending on
> them.  But please understand that it's almost impossible to remove
> _anything_ from the C standard, because the main thing C has going for
> it anymore is backward compatibility all the way to the 1980s.  We
> will not be dropping -fexec-charset as long as it's a feature of the C
> standard.
>

Reply via email to