On Mon, May 04, 2020 at 10:19:02PM -0400, Boyuan Yang wrote: > Mwei (https://nm.debian.org/person/mwei/) just talked to me saying that it > could be a bug with isSPACE_L1 macro in perl's pp.c. He will be replying the > email soon. >
Hi, (I used reportbug to handle reply of this thread, and I missed a lot of recipients here. This is a resend of reply in #959474. Sorry for the noise.) After a bit of investigation of Perl source code (5.31.11 downloaded from upstream) I found the they have weird handling of whitespace when `feature unicode_strings` turned on. I am not a perl person and I haven't executed the source code yet, so my interpretation might be wrong. When `unicode_strings` is on, `in_uni_8_bit` should true internally, and in three places of pp.c:6040, pp.c:6076, pp.c:6114 `isSPACE_L1` is called to check whether the examining character is a whitespace, by checking whether the character is 0x85 or 0xA0 (handy.h:1611). In the case of the character 包, the last byte of 3-byte UTF-8 code is 0x85, henceforth the problem.
signature.asc
Description: PGP signature