# New Ticket Created by Patrick R. Michaud # Please include the string: [perl #38931] # in the subject line of all future correspondence about this issue. # <URL: https://rt.perl.org/rt3/Ticket/Display.html?id=38931 >
This is a suggestion regarding double-quoted string literals in Parrot. Currently double-quoted strings are always assumed to be ASCII unless prefixed by a different charset identifier such as 'unicode:' or 'iso-8859-1:'. Unfortunately, this means that string literals like: $S1 = "He said, \xabHello\xbb" $S2 = "3 \u2212 4 = \u207b 1" are treated as ASCII strings even though they obviously contain codepoints outside of the ASCII range. (The first results in a 'malformed string' error when compiled, the second chops off the high-order bits of the \u sequence.) It would be really helpful to PIR emitters if Parrot could automatically use the presence of \u or \x in double-quotes to generate a 'unicode:' or 'iso-8859-1:' string (absent any other prefix specification which would override). If this were in place, producing a valid string literal for PIR would simply be (regardless of the encoding of $S0): $S1 = escape $S0 $S1 = concat '"', $S1 $S1 = concat $S1, '"' Currently, an emitter must also check for the presence of any \u or \x sequences in $S1, and then prefix the double-quoted literal with 'unicode:' or 'iso-8859-1:' accordingly. If this can't be easily done, then I will probably create a "parrot_escape" function in Data::String to handle the generation, but it would be great if Parrot could handle it natively. Thanks, Pm