On 21/07/18 07:20, Chih-Hsuan Yen wrote:
> Hi coreutils developers,
> 
> I'm using coreutils on macOS High Sierra (10.13). I noticed that with
> `LANG=zh_TW.UTF-8`, `df` output is corrupted.
> 
> �?�?系統 容�?? 已�?� �?��?� 已�?�% �??�?�?
> /dev/disk1s1    234G  151G    81G    65% /
> /dev/disk1s4    234G  2.1G    81G     3% /private/var/vm
> 
> (I'm not sure if other mail agents can display those characters
> correctly or not. See my blog post [1] for the exact output.)
> 
> Seems it's similar to bug#25630 [2], which is not resolved. I guess
> the reason of my issue is that iscntrl() is broken on macOS High
> Sierra, so in hide_problematic_chars(), some bytes in the Chinese
> header is replaced with a question mark. I managed to patch coreutils
> [3] to make `df` work. Could you have a look? Thanks!
> 
> Best,
> 
> Chih-Hsuan Yen
> 
> [1] https://blog.chyen.cc/posts/2018/06/23/mac-df-chinese.html
> [2] http://lists.gnu.org/archive/html/bug-coreutils/2017-02/msg00008.html
> [3] 
> https://github.com/yan12125/macports-ports/blob/fix-coreutils-df-chinese/sysutils/coreutils/files/patch-df.diff

Wow. That's surprising. I do see the FreeBSD man pages say:

"The 4.4BSD extension of accepting arguments outside of the range of the 
unsigned char type
in locales with large character sets is considered obsolete and may not be 
supported in
future releases."

Now I think that might have been referring to >= 0xFF, but fair enough.

I've attached a gnulib patch to document for iscntrl at least.
It would be great if someone could test the other is*() classification
functions on macOS so that I might have a more complete documentation patch.

I've also attached an alternative patch for df (in your name).
Can you try that one?

thanks!
Pádraig
>From 6b7434fb222144af3ae9e2d4fd6b4c72eec25f5b Mon Sep 17 00:00:00 2001
From: Chih-Hsuan Yen <yan12...@gmail.com>
Date: Sat, 21 Jul 2018 13:19:23 -0700
Subject: [PATCH] df: avoid multibyte character corruption on macOS

* src/df.c (hide_problematic_chars): Use c_iscntrl() as
passing 8 bit characters to iscntrl() is not supported on macOS.
* NEWS: Mention the bug fix.
Fixes https://bugs.gnu.org/32236
---
 NEWS     | 4 ++++
 src/df.c | 3 ++-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/NEWS b/NEWS
index af1a990..aa3b4f9 100644
--- a/NEWS
+++ b/NEWS
@@ -2,6 +2,10 @@ GNU coreutils NEWS                                    -*- outline -*-
 
 * Noteworthy changes in release ?.? (????-??-??) [?]
 
+** Bug fixes
+
+  df no longer corrupts displayed multibyte characters on macOS.
+
 
 * Noteworthy changes in release 8.30 (2018-07-01) [stable]
 
diff --git a/src/df.c b/src/df.c
index 1178865..c851fcc 100644
--- a/src/df.c
+++ b/src/df.c
@@ -23,6 +23,7 @@
 #include <sys/types.h>
 #include <getopt.h>
 #include <assert.h>
+#include <c-ctype.h>
 
 #include "system.h"
 #include "canonicalize.h"
@@ -281,7 +282,7 @@ hide_problematic_chars (char *cell)
   char *p = cell;
   while (*p)
     {
-      if (iscntrl (to_uchar (*p)))
+      if (c_iscntrl (to_uchar (*p)))
         *p = '?';
       p++;
     }
-- 
2.9.3

>From 816cc0d5fb92552a551c523f49c829261731dfe8 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?P=C3=A1draig=20Brady?= <p...@draigbrady.com>
Date: Sat, 21 Jul 2018 13:15:13 -0700
Subject: [PATCH] iscntrl: document that macOS returns true for >= 0x80

* doc/posix-functions/iscntrl.texi: Mention that support
for chars >= 0x80 is not standarized, and not supported
on OS X >= 10.5 at least
---
 doc/posix-functions/iscntrl.texi | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/doc/posix-functions/iscntrl.texi b/doc/posix-functions/iscntrl.texi
index 7e6813f..3c15708 100644
--- a/doc/posix-functions/iscntrl.texi
+++ b/doc/posix-functions/iscntrl.texi
@@ -16,6 +16,9 @@ OS X 10.8.
 
 Portability problems not fixed by Gnulib:
 @itemize
+This function does not support arguments outside of the range of the
+unsigned char type in locales with large character sets, on some platforms.
+OS X 10.5 will return non zero for characters >= 0x80 in UTF-8 locales.
 @end itemize
 
 Note: This function's behaviour depends on the locale, but does not support
-- 
2.9.3

Reply via email to