Ingo Schwarze writes: > That's a bad idea. Do not use non-ASCII bytes in file names. > You are in for all kinds of trouble.
I don't agree. In a situation where a single user will be accessing files, you can use whatever naming scheme you like. UTF-8 works exactly how you would expect: the filename you enter is the filename you'll get. Misencoded files can also exist, with exactly the results you would expect also: you can't necessarily type it, but if you can pass the exact filename, programs will work. Same goes with control characters like backspaces in file names (far more annoying than UTF-8). Saying you can't is impractical. Anyone downloading lots of external files through web browsers, torrent clients, or any number of other programs in ports will eventually encounter files with UTF-8 filenames. They work just fine. Keeping spaces out of filenames is already a lost battle, let alone limiting them to the POSIX portable filename character set (A-Za-z0-9._-). Obviously once you start talking about files on external media or otherwise accessible by users in other locales, that conclusion changes. But I'm talking about a personal desktop here. > > So it looks like xterm is changing > > I'm not convinced it is xterm; it might also be the X libraries > supporting copying with the mouse. Anyway, whatever does it is > allowed to. This is indeed xterm's fault. precompose (class Precompose) Tells xterm whether to precompose UTF-8 data into Normalization Form C, which combines commonly-used accents onto base characters. If it does not do this, accents are left as separatate characters. The default is "true". In my opinion, that's a *very* poor default. I don't expect base tools to canonicalize text like that. UTF-8 strings work fine when passed to grep(1), but grep doesn't -- nor would I expect it to -- canonicalize strings, or ignore zero-width no-break spaces in running text, or any other sort of weird transformation invented by the Unicode committee. The only unexpected thing here is xterm doing these transformations without asking. -- Anthony J. Bentley