Paul Eggert <[EMAIL PROTECTED]> writes:

> Thanks, everybody, for writing about this.
> 
> The standardization process is one of consensus, and if the GCC
> developers find some areas of disagreement here I think it unlikely
> that the other POSIX implementers will agree with the proposed action.
> Hence I am thinking of weakening it.
> 
> Currently the action proposes to insert the following text:
> 
>    It is implementation-defined whether trailing white-space characters
>    in each C-language source line are ignored.  Otherwise, the
>    multibyte characters of each source line are mapped on a one-to-one
>    basis to the C source character set.
> 
> How about if I propose to insert the following text instead?
> 
>    The multibyte characters of each source file are mapped to the C
>    source character set on a one-to-one basis, with the following
>    exceptions:
> 
>      * It is implementation-defined whether trailing white-space
>        characters in each input line are ignored.
> 
>      * Each extended source character, and each sequence of characters
>        that would otherwise be recognized as a universal character
>        name, is mapped to an implementation-defined extended source
>        character or universal character name.  If a universal
>        character name is continued by a backslash-newline across a
>        line boundary, the mapped output sequence contains the same
>        number of backslash-newlines as the the input, but their
>        location in the output sequence is unspecified.
> 
> Would this weaker action pose an undue burden on GCC?  My sense from
> the discussion is "no", but I'd like to double-check with the experts.

I think this is no problem for GCC, at least as far as UCNs go.


I am less sure about the things that this is missing.  For instance,
it doesn't really define 'extended source character' in POSIX terms.
The POSIXy way to do that would be to refer to the LC_CHARSET
environment variable, but then consider

LC_CHARSET=UTF-16 c99 foo.c

where 'foo.c' is in UTF-16 and contains '#include <stdio.h>', which is
in /usr/include but is in some other character set.  So, there's some
complication here that I'm not sure is completely solved, or at least
that I don't fully understand.  But, I'm pretty sure GCC would be just
as hurt by that as any other compiler, so at least there are no
GCC-specific problems.

Reply via email to