2018-07-22 23:12 GMT+08:00 Paul Eggert <egg...@cs.ucla.edu>: > Pádraig Brady wrote: >> >> I've also attached an alternative patch for df (in your name). > > > That still has problems, since it can generate improperly-encoded strings in > UTF-8 locales (if the inputs are improperly encoded), and can replace parts > of multibyte characters with '?' in non-UTF-8 locales. Please try the > attached patch instead, which attempts to address these issues. This is more > along the lines that Bruno suggested, except it doesn't use mbsiter as I > figured it was simpler overall just to use mbrtowc directly for this one > thing.
Here's the result of df: $ df 檔案系統 容量 已用 可用 已用 掛載點 /dev/disk1s1 234G 137G 95G 60% / /dev/disk1s4 234G 2.1G 95G 3% /private/var/vm chyen.cc: 25G 12G 12G 51% /private/tmp/abc def ghi $ df | xxd 00000000: e6aa 94e6 a188 e7b3 bbe7 b5b1 2020 2020 ............ 00000010: 2020 2020 e5ae b9e9 878f 2020 e5b7 b2e7 ...... .... 00000020: 94a8 2020 e58f afe7 94a8 20e5 b7b2 e794 .. ...... ..... 00000030: a820 e68e 9be8 bc89 e9bb 9e0a 2f64 6576 . ........../dev 00000040: 2f64 6973 6b31 7331 2020 2020 3233 3447 /disk1s1 234G 00000050: 2020 3133 3747 2020 2039 3547 2020 3630 137G 95G 60 00000060: 2520 2f0a 2f64 6576 2f64 6973 6b31 7334 % /./dev/disk1s4 00000070: 2020 2020 3233 3447 2020 322e 3147 2020 234G 2.1G 00000080: 2039 3547 2020 2033 2520 2f70 7269 7661 95G 3% /priva 00000090: 7465 2f76 6172 2f76 6d0a 6368 7965 6e2e te/var/vm.chyen. 000000a0: 6363 3a20 2020 2020 2020 2032 3547 2020 cc: 25G 000000b0: 2031 3247 2020 2031 3247 2020 3531 2520 12G 12G 51% 000000c0: 2f70 7269 7661 7465 2f74 6d70 2f61 6263 /private/tmp/abc 000000d0: e280 a864 6566 e280 a967 6869 0a ...def...ghi. Chinese header names are correct, and U+2028 and U+2029 are written as-is. All tested with LANG=zh_TW.UTF-8 LC_COLLATE=C LC_CTYPE=zh_TW.UTF-8.