Re: UTF-8

Alexander E. Patrakov Sat, 21 Jan 2006 23:13:14 -0800

Jeremy Herbison wrote:

Here is another argument: Without the UTF-8 patch, there is still
Partial (broken) UTF-8 support in the book.

Yes, broken.

1) One cannot set up UTF-8 mode on the console using old bootscripts,

2) grep incorrectly matches '[' for all character classes (i.e.,brokenness on pure ASCII),

3) one cannot use ncurses apps because they display garbage,
4) one cannot make sense of what GNOME logs to syslog,
5) one has to jump through hoops in order to read manual pages.

The huge -i18n patches that are required by LSB really don't change thepicture because they don't match (at least my) typical usage pattern.E.g., "grep" is used mainly in order to search something in the programsources or logs (both are nearly pure ASCII), or to help parsingconfiguration files (also ASCII). I tried using "tr" in my scripts thatmanipulate SAMBA users, but it doesn't support character classes such as[:upper:] and [:lower:] (and LSB doesn't test for that!), so I ended upusing Perl. So, a minimal patch that does the job of adding acceptablelevel of UTF-8 support to the old book would be:


1) new "console" bootscript
2) two-line grep "bracket" patch (part of the -i18n-1 patch)
3) ncursesw
4) sysklogd 8-bit cleanness patch
5) removal of non-ISO-8859-1 manual pages.

Some packages already
enable UTF-8 (gettext, bash, glibc etc), and the testsuites
require some UTF-8 locales.

You are right about some (not all) packages, but not about testsuites.Bash indeed contains code like this:


if (MB_CUR_MAX > 1)
   quick_and_simple_implementation();
else
   completely_different_code_that_does_the_same();

If the system doesn't support UTF-8 locales, the "else" is indeed bloat.

In Gettext, this macro is used only

1) as an argument to functions like malloc(), in order to getcorrectly-sized buffers,2) in files that get used with old libc implementations in order toprovide missing functions,

3) in libgrep (legacy code base imported from grep)

So only the case (3) adds bloat to a system that doesn't support UTF-8locales.

As for testsuites, omitting non-European and UTF-8 locales only resultsin tests being SKIPPED, not FAILED. That's exactly what one wants: whytest functionality that is not going to be used? Personally, I evenprefer a failure of such useless test to creating infrastructurerequired only for it to pass.

Thus, anyone who wants the UTF-8 patch
removed should be responsible for additionally removing all UTF-8
support from the book (which I expect would be a much larger, more
intrusive patch).

This would also break glibc binary compatibility. But there is a quickway to disable installation of multibyte and otherwise problematic locales:

sed -i -e '/UTF-8/d' -e '/EUC/d' -e '/\/GB/d' -e '/BIG5/d' -e '/TCVN/d'localedata/SUPPORTED

BTW, this would fix part 1 ofhttp://blfs-bugs.linuxfromscratch.org/show_bug.cgi?id=909 (that wasignored all the time).


--
Alexander E. Patrakov
--
http://linuxfromscratch.org/mailman/listinfo/lfs-dev
FAQ: http://www.linuxfromscratch.org/faq/
Unsubscribe: See the above information page

Re: UTF-8

Reply via email to