2010/8/5 Matthew Dempsky <matt...@dempsky.org>:
> On Wed, Aug 4, 2010 at 6:22 AM, Jordi Beltran Creix
> <jbcreix.m...@gmail.com> wrote:
>> ls(1) needs to use wcwidth(3) instead of just assuming 1 for alignment
>> and if I remember correctly it also mangles the strings using
>> isprint(3) or hardcoded values instead of iswprint(3) when printing to
>> terminal which is probably what you are seeing here. ed(1) is broken
>> by the latter and ksh(1) for both reasons.
>
> Is there any useful documentation that explains how you're supposed to
> write C code and what's changed under the i18n New World Order? B From
> your message, it sounds like we're going to have to rewrite nearly all
> of our user-space code...
>

Not everything, but utilities that do ls-like alignment with file
names and other user provided strings, do need small modifications if
they are to be made Unicode friendly. The names should still print
correctly as long as they aren't mangled but anything that uses 0 or 2
char-wide glyphs will be misaligned. Reading user input interactively
from terminal needs to account for glyph width as well, but that
mostly happens in the libraries.

String and input mangling occurs when the programs try to sanitize
control characters. In the case of UTF-8, terminal control sequences
over 0x80 can be a valid part of a printable character.

And then there is collation which means people get angry when IJ.txt
is listed after II.txt. However, many Unicode aware programs ignore it
and it is optional in POSIX regexes.

All programs that output raw strings, don't attempt alignment, and
don't work with glyphs or code points(stuff like regexes is out but
not simple matching and replacement), are safe from i18n. If you
ignore its features, UTF-8 is just like ASCII and nothing has to
change, no need to use Unicode functions for everything.

This old FAQ is the best resource there is by far about supporting
UTF-8 and locales in POSIX programs:

http://www.cl.cam.ac.uk/~mgk25/unicode.html

... and then there are many other implementations of the same
utilities that have been adapted to different degrees before.

Reply via email to