Jeremy Herbison wrote:

Here is another argument: Without the UTF-8 patch, there is still
Partial (broken) UTF-8 support in the book.

Yes, broken.

1) One cannot set up UTF-8 mode on the console using old bootscripts,
2) grep incorrectly matches '[' for all character classes (i.e., brokenness on pure ASCII),
3) one cannot use ncurses apps because they display garbage,
4) one cannot make sense of what GNOME logs to syslog,
5) one has to jump through hoops in order to read manual pages.

The huge -i18n patches that are required by LSB really don't change the picture because they don't match (at least my) typical usage pattern. E.g., "grep" is used mainly in order to search something in the program sources or logs (both are nearly pure ASCII), or to help parsing configuration files (also ASCII). I tried using "tr" in my scripts that manipulate SAMBA users, but it doesn't support character classes such as [:upper:] and [:lower:] (and LSB doesn't test for that!), so I ended up using Perl. So, a minimal patch that does the job of adding acceptable level of UTF-8 support to the old book would be:

1) new "console" bootscript
2) two-line grep "bracket" patch (part of the -i18n-1 patch)
3) ncursesw
4) sysklogd 8-bit cleanness patch
5) removal of non-ISO-8859-1 manual pages.

Some packages already
enable UTF-8 (gettext, bash, glibc etc), and the testsuites
require some UTF-8 locales.

You are right about some (not all) packages, but not about testsuites. Bash indeed contains code like this:

if (MB_CUR_MAX > 1)
   quick_and_simple_implementation();
else
   completely_different_code_that_does_the_same();

If the system doesn't support UTF-8 locales, the "else" is indeed bloat.

In Gettext, this macro is used only
1) as an argument to functions like malloc(), in order to get correctly-sized buffers, 2) in files that get used with old libc implementations in order to provide missing functions,
3) in libgrep (legacy code base imported from grep)

So only the case (3) adds bloat to a system that doesn't support UTF-8 locales.

As for testsuites, omitting non-European and UTF-8 locales only results in tests being SKIPPED, not FAILED. That's exactly what one wants: why test functionality that is not going to be used? Personally, I even prefer a failure of such useless test to creating infrastructure required only for it to pass.

Thus, anyone who wants the UTF-8 patch
removed should be responsible for additionally removing all UTF-8
support from the book (which I expect would be a much larger, more
intrusive patch).
This would also break glibc binary compatibility. But there is a quick way to disable installation of multibyte and otherwise problematic locales:

sed -i -e '/UTF-8/d' -e '/EUC/d' -e '/\/GB/d' -e '/BIG5/d' -e '/TCVN/d' localedata/SUPPORTED

BTW, this would fix part 1 of http://blfs-bugs.linuxfromscratch.org/show_bug.cgi?id=909 (that was ignored all the time).

--
Alexander E. Patrakov
--
http://linuxfromscratch.org/mailman/listinfo/lfs-dev
FAQ: http://www.linuxfromscratch.org/faq/
Unsubscribe: See the above information page
  • Re: UTF-8 Alexander E. Patrakov

Reply via email to