Andrew McMillan wrote:
On Tue, 2009-04-07 at 22:32 +0200, Adeodato Simó wrote:
It is my impression that more packages than mksh could use an UTF-8
locale at build time (I’m afraid I don’t have pointers, but I’m sure
I’ve come across at least a couple).
Wouldn’t it be just better to change Debian’s default to make an UTF-8
locale available by default, rather than to force all those packages to
play tricks with LOCPATH?
I too would really like to see a UTF-8 locale available by default, and
would prefer to see this be the C.UTF-8 locale, which doesn't screw with
the collation / character type settings like any other UTF-8 locale
would.
It seems to me that the consensus here is that having a UTF-8 locale
available is a good idea and I don't hear any very strong argument
against such a change.
Consequently I think we should move on from the discussion and start
working out a patch to resolve this in policy.
So I've a question: what does UTF-8 mean in this context (C.UTF-8) ?
It is not a stupid question, and the answer is not the UTF-8 algorithm
to code/decode unicode.
I'm still thinking that you are confusing the various meanings.
And until I understand the problem, I cannot propose a solution.
- terminals should be sensible to charsets, on choosing how to display
things
- programs should be sensible to locales (topic of this discussion):
the locales provides some charsets dependent strings, and interpretation
of the various characters, but (usually) they MUST NOT translate characters.
Anyway:
The locale C is already a UTF-8 compatible locale.
No? so what it misses?
- other alphabetic, numeric, currency, whitespace characters? But not UTF-8
local provides all characters: they define only the needed range for the
language [see wikipedia, which should code UTF-8 as binary for this reason].
The "C" "spoken" language require only ASCII-7 (or maybe only a subrange of
it).
So why we need further characters?
Note: whitespace are restricted in "C" locale by POSIX, in only two values
We could use charset UTF-8 for C locale, declaring unused/illegal all
c > 127. Whould this solve the problems with mksh? I don't think so,
so what you need in this C.UTF-8?
I still think that "en_US.UTF-8" is the right default (note:
I'm not a US citizen, nor I speak English).
The installation will install the correct locale, so the en_US period is very
short (we'll dominate them ;-) ).
On debootstrap/pbuild/... things are different. But if it this the problem,
let check a solution for building environment (and I still think that in this
env "en_US.UTF-8" could be nice.
But I'll prefer a simple basic ASCII-7 "C" for basic/plain build, and only
after packager thinks if it is a bug or a feature to have a specific build with
UTF-8, it should manually set it.
Why build need to depend to a locale?
UNIX way is to allow to compile things for remote (maybe other OS, other arch)
system.
For testing? So why not test various locales (UTF-8, but also other non
ascii based encodings)
ciao
cate
--
To UNSUBSCRIBE, email to debian-policy-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org