2010/8/5 Matthew Dempsky <matt...@dempsky.org>: > On Wed, Aug 4, 2010 at 6:22 AM, Jordi Beltran Creix > <jbcreix.m...@gmail.com> wrote: >> ls(1) needs to use wcwidth(3) instead of just assuming 1 for alignment >> and if I remember correctly it also mangles the strings using >> isprint(3) or hardcoded values instead of iswprint(3) when printing to >> terminal which is probably what you are seeing here. ed(1) is broken >> by the latter and ksh(1) for both reasons. > > Is there any useful documentation that explains how you're supposed to > write C code and what's changed under the i18n New World Order? B From > your message, it sounds like we're going to have to rewrite nearly all > of our user-space code... >
Not everything, but utilities that do ls-like alignment with file names and other user provided strings, do need small modifications if they are to be made Unicode friendly. The names should still print correctly as long as they aren't mangled but anything that uses 0 or 2 char-wide glyphs will be misaligned. Reading user input interactively from terminal needs to account for glyph width as well, but that mostly happens in the libraries. String and input mangling occurs when the programs try to sanitize control characters. In the case of UTF-8, terminal control sequences over 0x80 can be a valid part of a printable character. And then there is collation which means people get angry when IJ.txt is listed after II.txt. However, many Unicode aware programs ignore it and it is optional in POSIX regexes. All programs that output raw strings, don't attempt alignment, and don't work with glyphs or code points(stuff like regexes is out but not simple matching and replacement), are safe from i18n. If you ignore its features, UTF-8 is just like ASCII and nothing has to change, no need to use Unicode functions for everything. This old FAQ is the best resource there is by far about supporting UTF-8 and locales in POSIX programs: http://www.cl.cam.ac.uk/~mgk25/unicode.html ... and then there are many other implementations of the same utilities that have been adapted to different degrees before.