https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=225692
Bug ID: 225692 Summary: iswprint() wrong for some FULL WIDTH characters in UTF-8 locale Product: Base System Version: 11.1-RELEASE Hardware: Any OS: Any Status: New Severity: Affects Some People Priority: --- Component: bin Assignee: freebsd-bugs@FreeBSD.org Reporter: jkerian+freebsdb...@gmail.com Created attachment 190345 --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=190345&action=edit Simple iswprint test When I run ls -B on one of my files, the UTF-8 pattern 0xef 0xbc 0x88 appears to be replaced as unprintable. According to http://www.utf8-chartable.de/unicode-utf8-table.pl?start=65280&utf8=0x, this should be U+FF08 a fullwidth left parenthesis. According to http://demo.icu-project.org/icu-bin/ubrowse?ch=FF08, U+FF08 should be a perfectly printable character in a UTF-8 locale. Looking at the ls.c source code eventually led me to iswprint(). I wrote the simple program to test print the character enums and then print iswprint() results in a few locales on a series of characters. (Attaching in case of link rot, code & linux results can be seen: https://wandbox.org/permlink/ZDc36tQhh7BLRpBx) Linux and OSX have some odd behavior around the classes, but U+2002 and U+FF08 are both perfectly printable on both systems in the UTF-8 locales. On the other hand FreeBSD is only returning 1 for iswprint(0x64), while it should be showing U+2002 and U+FF08 as printable. On my box, running FreeBSD 11.1-RELEASE-p4 GENERIC amd64, I get the following results: [dev ~/test/iswprint]$ ./a.out alnum:0x400100, cntrl:0x200, ideogram:0x80000, print:0x40000, space:0x4000, xdigit:0x10000, alpha:0x100, digit:0x400, lower:0x1000, punct:0x2000, special:0x100000, blank:0x20000, graph:0x800, phonogram:0x200000, rune:0xffffff00, upper:0x8000, Default Locale is: C Character 0x64 is in classes: alnum print xdigit alpha lower graph rune in C locale, iswprint(0x64) = 1 in en_US.UTF-8 locale, iswprint(0x64) = 1 in ja_JP.UTF-8 locale, iswprint(0x64) = 1 Character 0x2002 is in classes: space rune in C locale, iswprint(0x2002) = 0 in en_US.UTF-8 locale, iswprint(0x2002) = 0 in ja_JP.UTF-8 locale, iswprint(0x2002) = 0 Character 0xff08 is in classes: rune in C locale, iswprint(0xff08) = 0 in en_US.UTF-8 locale, iswprint(0xff08) = 0 in ja_JP.UTF-8 locale, iswprint(0xff08) = 0 Character 0x2002 is in classes: space rune in C locale, iswprint(0x2002) = 0 in en_US.UTF-8 locale, iswprint(0x2002) = 0 in ja_JP.UTF-8 locale, iswprint(0x2002) = 0 Character 0x82 is in classes: cntrl rune in C locale, iswprint(0x82) = 0 in en_US.UTF-8 locale, iswprint(0x82) = 0 in ja_JP.UTF-8 locale, iswprint(0x82) = 0 I confirmed with a few other FreeBSD users that they get the same results. -- You are receiving this mail because: You are the assignee for the bug. _______________________________________________ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"