[perl #50092] [TODO] pct - explicit transcode in PCT::Grammar::string_literal

via RT Tue, 22 Jan 2008 00:01:56 -0800

# New Ticket Created by  Patrick R. Michaud 
# Please include the string:  [perl #50092]
# in the subject line of all future correspondence about this issue. 
# <URL: http://rt.perl.org/rt3/Ticket/Display.html?id=50092 >



This is a placeholder ticket so we can show a dependency on #39930.

In the "string_literal" rule of compilers/pct/src/PCT/Grammar.pir,
I've just added an explicit transcode step for codepoints outside
of the ascii range:

    ...
    $S1 = chr codepoint
    if codepoint < 128 goto literal_xdo_char_end_1
    $I0 = find_charset 'unicode'
    trans_charset $S1, $I0
  literal_xdo_char_end_1:
    concat literal, $S1
    ...

The reason for the explicit transcode is to allow the above to 
work even when ICU isn't present.  By default, the 'chr'
opcode returns an ascii string for codepoints 0-127, an
iso-8859-1 string for 128-255, and unicode for everything
256 and above.  However, as noted in RT#39930, Parrot is
unable to concatenate iso-8859-1 strings to unicode strings
when ICU isn't present.  So, the above workaround automatically 
converts any non-ascii strings into unicode so that the
resulting concatenation will work properly.

When #39930 is resolved, we can eliminate the workaround.

Pm

[perl #50092] [TODO] pct - explicit transcode in PCT::Grammar::string_literal

Reply via email to