Ingo Schwarze writes:
> That's a bad idea.  Do not use non-ASCII bytes in file names.
> You are in for all kinds of trouble.

I don't agree. In a situation where a single user will be accessing
files, you can use whatever naming scheme you like. UTF-8 works exactly
how you would expect: the filename you enter is the filename you'll get.

Misencoded files can also exist, with exactly the results you would
expect also: you can't necessarily type it, but if you can pass the
exact filename, programs will work. Same goes with control characters
like backspaces in file names (far more annoying than UTF-8).

Saying you can't is impractical. Anyone downloading lots of external
files through web browsers, torrent clients, or any number of other
programs in ports will eventually encounter files with UTF-8 filenames.
They work just fine. Keeping spaces out of filenames is already a lost
battle, let alone limiting them to the POSIX portable filename character
set (A-Za-z0-9._-).

Obviously once you start talking about files on external media or
otherwise accessible by users in other locales, that conclusion changes.
But I'm talking about a personal desktop here.

> > So it looks like xterm is changing
>
> I'm not convinced it is xterm; it might also be the X libraries
> supporting copying with the mouse.  Anyway, whatever does it is
> allowed to.

This is indeed xterm's fault.

       precompose (class Precompose)
               Tells xterm whether to precompose UTF-8 data into Normalization
               Form C, which combines commonly-used accents onto base
               characters.  If it does not do this, accents are left as
               separatate characters.  The default is "true".

In my opinion, that's a *very* poor default. I don't expect base tools
to canonicalize text like that.

UTF-8 strings work fine when passed to grep(1), but grep doesn't -- nor
would I expect it to -- canonicalize strings, or ignore zero-width
no-break spaces in running text, or any other sort of weird
transformation invented by the Unicode committee.

The only unexpected thing here is xterm doing these transformations
without asking.

-- 
Anthony J. Bentley

Reply via email to