Bug#99933: second attempt at more comprehensive unicode policy

Colin Walters Wed, 08 Jan 2003 00:55:32 -0600

On Mon, 2003-01-06 at 16:15, Jochen Voss wrote:
> Hello Colin,
> 
> On Fri, Jan 03, 2003 at 09:50:26PM -0500, Colin Walters wrote:
> > In summary, UTF-8 is the *only* sane character set to use for
> > filenames.
> At least I agree to this :-)


Cool.

> I think that we need filename conversion between UTF-8 and the user's
> character set, because we cannot ban all non-UTF8 terminal types.  In
> my opinion the main problem is, where this conversion should take
> place.

I will say this much; I simply did not even consider doing this kind of
character set conversion as part of glibc or Linux.  It just seems like
such a horrible kludge that would not actually work in practice. 
Fundamentally, glibc and Linux cannot know what charset the application
itself works in.  You might have stuff that undergoes UTF-8 conversion
*twice*, once by the application and once by glibc for example.  It just
seems like a recipie for disaster.

> Because a lot of programs is affected, it would gain us much, if we
> could move this as deep as into libc or even into the kernel.  

Again: I argue that we need to change all these programs *anyways*,
because you can't just use your same old C library string functions on
UTF-8. I know it seems tempting to just stick some code into glibc, but
I have serious doubts that will ever work in anything resembling a
reliable fashion. 

Feel free to prove me wrong of course!

> Does anybody know: how do they solve the problems we discuss here?
> Where do they convert filenames, e.g. when I login via ssh and
> type "ls -l Bär*" from my LC_CTYPE=ISO-8859-15 system?

I think that it quite simply does not work.

> > Again, major chunks of upstream software which have Unicode support
> > (like GNOME), are *already* defaulting to interpreting filenames as
> > UTF-8 by default.
> And how is the conversion done there?

What conversion?  GNOME apps speak UTF-8 natively, and that's about all
they speak unless you set the G_BROKEN_FILENAMES environment variable.

Bug#99933: second attempt at more comprehensive unicode policy

Reply via email to