Re: [PATCH] RFC: On-demand locations within string-literals

Jeff Law Thu, 21 Jul 2016 09:38:27 -0700

On 07/20/2016 01:38 PM, David Malcolm wrote:

On Fri, 2016-07-08 at 17:49 -0400, David Malcolm wrote:
[...]

Also, this patch currently makes the assumption (in charset.c)
that there's a 1:1 correspondence between bytes in the source
character set and bytes in the execution character set.  This can
be the case if both are, say, UTF-8, but might not hold in
general.

The source char set is UTF-8 or UTF-EBCDIC, and safe-ctype.c has:

# if HOST_CHARSET == HOST_CHARSET_EBCDIC
  #error "FIXME: write tables for EBCDIC"

so presumably we don't actually have any hosts that supports EBCDIC
(do we?); as far as I can tell, we only currently support UTF-8
as the source char set.

Similarly, do we support any targets for which the execution
character set is *not* UTF-8?


I brought this up in this thread on the gcc mailing list:
"gcc/libcpp: non-UTF-8 source or execution encodings?"
  https://gcc.gnu.org/ml/gcc/2016-07/msg00091.html
and in particular:
  https://gcc.gnu.org/ml/gcc/2016-07/msg00106.html
it's possible to select the execution char set using at the command
-line for C-family frontends using:
  -fexec-charset=
  -fwide-exec-charset=
e.g. "-fexec-charset=IBM1047" will give one of the variants of EBCDIC.

Given that the internal interface already has a failure mode, I'm
thinking that a reasonable restriction is to only support locations
within string literals for the case where source character set ==
execution character set, and hence we have "convert_no_conversion" as
the converter.  Does that sound sane?  (I can write test coverage for
this).

I think this is sane. We can always revisit later if we change ourminds, particularly if folks want to do something crazy like self-hoston an EBCDIC system.


jeff

Re: [PATCH] RFC: On-demand locations within string-literals

Reply via email to