Hi, 在 2020-05-05星期二的 03:34 +0200,Axel Beckert写道: > → echo 包 | perl -pe 's|\s+\n|\n|sg;' > 包 > → echo 包 | perl -M"feature unicode_strings" -pe 's|\s+\n|\n|sg;' > � > > Which kinda sounds like a Perl bug. Cc'ing the maintainers of Debian's > perl package (not the whole Debian Perl Team), maybe they have some > insight what actually goes wrong here and if that's indeed a Perl bug.
I guess it is a Perl bug. I am listing more Chinese characters other than "包" here that can trigger the problem: % echo 包 | perl -M"feature unicode_strings" -pe 's|\s+\n|\n|sg;' � % echo 赠 | perl -M"feature unicode_strings" -pe 's|\s+\n|\n|sg;' � % echo 传 | perl -M"feature unicode_strings" -pe 's|\s+\n|\n|sg;' � % echo 阅 | perl -M"feature unicode_strings" -pe 's|\s+\n|\n|sg;' � % echo 加 | perl -M"feature unicode_strings" -pe 's|\s+\n|\n|sg;' � % echo 者 | perl -M"feature unicode_strings" -pe 's|\s+\n|\n|sg;' � % echo -n 赠 | hexdump -C 00000000 e8 b5 a0 % echo -n 传 | hexdump -C 00000000 e4 bc a0 % echo -n 包 | hexdump -C 00000000 e5 8c 85 % echo -n 阅 | hexdump -C 00000000 e9 98 85 % echo -n 加 | hexdump -C 00000000 e5 8a a0 % echo -n 者 | hexdump -C 00000000 e8 80 85 (Note that 0xA0 and 0x85 at the end.) Mwei (https://nm.debian.org/person/mwei/) just talked to me saying that it could be a bug with isSPACE_L1 macro in perl's pp.c. He will be replying the email soon. -- Thanks, Boyuan Yang
signature.asc
Description: This is a digitally signed message part