Alan Lord wrote:
As a UK (English) based LFS user/builder I have, so far, had no problem
using a standard LFS build. However, it has seemed to me that almost all
new IT systems and Web Services platforms are using UTF-8 encoding. I am
therefore planning to build my next LFS as a UTF-8.
You are welcome, but please back up your data with tar-1.15.1 before
doing so, as follows:
tar --format=posix -jcf /path/to/home-backup.tar.bz2 /home
The --format=posix switch is essential, it ensures that filenames are
converted to UTF-8 if you untar this archive in a UTF-8 locale.
The reason for making such a backup is that filenames with non-ASCII
characters can be displayed propely only in one of those systems, the
one which created them. File contents have to be reencoded manually
after unpacking.
My point to this list is really about the "whys"...
* Why is UTF-8 important to linux users (especially English speaking)?
It is not important, but some people want it. The goal for LFS/BLFS is
to provide a system for them that is not more broken than a typical
modern RedHat system.
* Why should anyone bother about it in the first place?
Because people such as Markus Kuhn say that a Linux system must support
UTF-8. Because LSB includes Li18nux2000 by reference, and Li18nux2000
says that at least UTF-8 must be supported. Because RedHat doesn't
support anything else, and other distros are going to "UTF-8 by default".
But do you really listen to all of the above-listed propaganda?
Generally, this comes from USA, from English-only people who use only
ASCII and thus don't care about character sets at all and don't see the
breakage. As for LSB, they can't implement their own requirements: their
"sample implementation" contained unpatched gawk-3.1.4 at some time, and
that version has horrible problems with UTF-8. For each RedHat disto
before Fedora Core 4, a guide appeared on Russian web sites how to make
it work with the good old KOI8-R encoding, because UTF-8 is too broken.
So (IMHO) the only valid reasons for choosing a UTF-8 based locale are:
1. RedHat compatibility at the cost of incompatibility with everyone
else (including MS Windows):
* Need to share files via NFS with systems that already use UTF-8
locales and can't be reconfigured
* Need to ssh often into systems that already use UTF-8 locales, and
rarely to systems that don't.
* Need to work mainly with UTF-8 encoded text documents, i.e. those
coming from people using RedHat systems
2. Just being adventurous (I think this applies to you).
* What does UTF-8 offer that ISO-8859-x does not?
Ability to use more than one non-English language in one text document.
Ability to communicate with everyone else who also uses UTF-8.
Big note: it is possible to edit UTF-8 text documents without using
UTF-8 locale. Just start Kate and select the UTF-8 encoding from the
menu. It is a known working and bug-free setup.
I think this could do with a bit of discussion. The USA/English world
doesn't, on the face of it, have anything to gain from going to a UTF-8
based format, however, I believe this is the way forward for *everyone*
and should perhaps have a greater emphasis in the LFS project as a
whole. Aren't most of the major linux distributions now using UTF-8?
Officially, yes. Unofficially, only total n00bs in Russia don't know
that they had to revert this (they just thought that Linux doesn't
support Russian well yet: newbies see bugs, but cannot identify fixable
regressions) until very recently in order to get a reasonably bug-free
system. This probably doesn't apply to English systems, though.
If I can help, Alex, I would be happy to help build/develop an english
language UTF-8 system for comparison/analysis.
You are welcome, although it is unlikely that you will find any bug
except general slowdown.
--
Alexander E. Patrakov
--
http://linuxfromscratch.org/mailman/listinfo/lfs-dev
FAQ: http://www.linuxfromscratch.org/faq/
Unsubscribe: See the above information page