* Will Crawford <[email protected]> [2014-01-31 13:05]: > If the string has been decoded *from* UTF-8 to Perl's internal > representation, it's *not* going to be marked as UTF8 internally; it > *shouldn't* be. It's no longer a "UTF8" string but a "Unicode" string, > complete with wide characters. If anything, the internal "UTF8" flag > means "this string needs decoding" rather than "has been decoded".
Sorry, this is nonsense. The UTF8 flag means the string is internally stored as a variable-width integer sequence using the same encoding scheme as UTF-8, which means it can store characters > 0xFF. If the UTF8 flag is off, the string is stored as a byte array. You are correct only insofar as that decoding a string could in theory yield a string with the UTF8 flag *off*. Because the UTF8 flag doesn’t mean anything. It only means that the string can store characters > 0xFF, which only matters to perl internally, since UTF8=0 strings will be transparently promoted to UTF8=1 whenever necessary. But Perl can’t tell whether a string is a Unicode string or byte string. The UTF8 flag is irrelevant. *You* can tell, because `length` returns 2 for a byte string with a “ü” represented in UTF-8, and 1 for a Unicode string with the character “ü”. (But `length` can return 1 for a UTF8=0 string, because the codepoint is 0xFC which can be stored as a single byte just fine; and it can return 2 even for a UTF8=1 string, because the UTF-8 encoded representation of “ü” is 0xC3 0xBC and it doesn’t matter whether you store that in a UTF8=0 or UTF8=1 string, it’s still the sequence 0xC3 0xBC.) Christian: This also affects you: you should not be looking at `is_utf8`. Instead you should be looking at whether `length` returns the correct value. Regards, -- Aristotle Pagaltzis // <http://plasmasturm.org/> _______________________________________________ List: [email protected] Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst Searchable archive: http://www.mail-archive.com/[email protected]/ Dev site: http://dev.catalyst.perl.org/
