# New Ticket Created by  Patrick R. Michaud 
# Please include the string:  [perl #38931]
# in the subject line of all future correspondence about this issue. 
# <URL: https://rt.perl.org/rt3/Ticket/Display.html?id=38931 >


This is a suggestion regarding double-quoted string literals
in Parrot.  Currently double-quoted strings are always assumed
to be ASCII unless prefixed by a different charset identifier
such as 'unicode:' or 'iso-8859-1:'.  Unfortunately, this means
that string literals like:

    $S1 = "He said, \xabHello\xbb"
    $S2 = "3 \u2212 4 = \u207b 1"

are treated as ASCII strings even though they obviously contain
codepoints outside of the ASCII range.  (The first results in a 
'malformed string' error when compiled, the second chops off the
high-order bits of the \u sequence.)

It would be really helpful to PIR emitters if Parrot could 
automatically use the presence of \u or \x in double-quotes 
to generate a 'unicode:' or 'iso-8859-1:' string (absent any other
prefix specification which would override).  If this
were in place, producing a valid string literal for PIR would 
simply be (regardless of the encoding of $S0):

    $S1 = escape $S0
    $S1 = concat '"', $S1
    $S1 = concat $S1, '"'            

Currently, an emitter must also check for the presence of any
\u or \x sequences in $S1, and then prefix the double-quoted 
literal with 'unicode:' or 'iso-8859-1:' accordingly.

If this can't be easily done, then I will probably create a
"parrot_escape" function in Data::String to handle the generation,
but it would be great if Parrot could handle it natively.

Thanks,

Pm

Reply via email to