[perl #128511] [BUG] utf8-c8 generates spurious NUL

via RT Fri, 01 Jul 2016 11:42:55 -0700

# New Ticket Created by  Zefram 
# Please include the string:  [perl #128511]
# in the subject line of all future correspondence about this issue. 
# <URL: https://rt.perl.org/Ticket/Display.html?id=128511 >



A decode-then-encode cycle through the utf8-c8 encoding is meant to
round-trip an octet string.  But sometimes it adds a spurious NUL to
the end of the string:

> Blob[uint8].new(233).decode("utf8-c8").encode("utf8-c8").perl
Blob[uint8].new(233,0)
> Blob[uint8].new(233, 128).decode("utf8-c8").encode("utf8-c8").perl
Blob[uint8].new(233,128,0)

This seems to happen whenever the end of the input string is a truncation
of the UTF-8 representation of a character in the Unicode range, and not
in any other case.  In particular, for single-octet strings it happens
iff the octet value is between 194 (0xc2) and 244 (0xf4) inclusive.

-zefram

[perl #128511] [BUG] utf8-c8 generates spurious NUL

Reply via email to