Jeremy Herbison wrote:
Here is another argument: Without the UTF-8 patch, there is still
Partial (broken) UTF-8 support in the book.
Yes, broken.
1) One cannot set up UTF-8 mode on the console using old bootscripts,
2) grep incorrectly matches '[' for all character classes (i.e.,
brokenness on pure ASCII),
3) one cannot use ncurses apps because they display garbage,
4) one cannot make sense of what GNOME logs to syslog,
5) one has to jump through hoops in order to read manual pages.
The huge -i18n patches that are required by LSB really don't change the
picture because they don't match (at least my) typical usage pattern.
E.g., "grep" is used mainly in order to search something in the program
sources or logs (both are nearly pure ASCII), or to help parsing
configuration files (also ASCII). I tried using "tr" in my scripts that
manipulate SAMBA users, but it doesn't support character classes such as
[:upper:] and [:lower:] (and LSB doesn't test for that!), so I ended up
using Perl. So, a minimal patch that does the job of adding acceptable
level of UTF-8 support to the old book would be:
1) new "console" bootscript
2) two-line grep "bracket" patch (part of the -i18n-1 patch)
3) ncursesw
4) sysklogd 8-bit cleanness patch
5) removal of non-ISO-8859-1 manual pages.
Some packages already
enable UTF-8 (gettext, bash, glibc etc), and the testsuites
require some UTF-8 locales.
You are right about some (not all) packages, but not about testsuites.
Bash indeed contains code like this:
if (MB_CUR_MAX > 1)
quick_and_simple_implementation();
else
completely_different_code_that_does_the_same();
If the system doesn't support UTF-8 locales, the "else" is indeed bloat.
In Gettext, this macro is used only
1) as an argument to functions like malloc(), in order to get
correctly-sized buffers,
2) in files that get used with old libc implementations in order to
provide missing functions,
3) in libgrep (legacy code base imported from grep)
So only the case (3) adds bloat to a system that doesn't support UTF-8
locales.
As for testsuites, omitting non-European and UTF-8 locales only results
in tests being SKIPPED, not FAILED. That's exactly what one wants: why
test functionality that is not going to be used? Personally, I even
prefer a failure of such useless test to creating infrastructure
required only for it to pass.
Thus, anyone who wants the UTF-8 patch
removed should be responsible for additionally removing all UTF-8
support from the book (which I expect would be a much larger, more
intrusive patch).
This would also break glibc binary compatibility. But there is a quick
way to disable installation of multibyte and otherwise problematic locales:
sed -i -e '/UTF-8/d' -e '/EUC/d' -e '/\/GB/d' -e '/BIG5/d' -e '/TCVN/d'
localedata/SUPPORTED
BTW, this would fix part 1 of
http://blfs-bugs.linuxfromscratch.org/show_bug.cgi?id=909 (that was
ignored all the time).
--
Alexander E. Patrakov
--
http://linuxfromscratch.org/mailman/listinfo/lfs-dev
FAQ: http://www.linuxfromscratch.org/faq/
Unsubscribe: See the above information page