On Thu, 2011-03-31 at 15:58 +0200, Bill Allombert wrote: > Dear Developpers, > > there are a small numbers of packages that ship files with non-7bit > characters in filenames. > $ apt-file search -l -x '[\x80-\xff]' > > aspell-ca > aspell-es > aspell-is > canorus > console-tools > dvb-apps > ggz-python-games > inorwegian > jpilot > lletters-media > otrs2 > wnorwegian > > So this raises two issues: > 1) should non-7bit characters in filenames be allowed > 2) if yes whould we require the filename to be in a correct UTF-8 encoding ? > > I raise the question because I was trying to filter out popcon reports that > include > non-7bit characters since it usually implies corruption of data, but this > might not be the > case. > > Also, it seems there is a tool out there that generate .deb packages with > names like > designkit.702840f10216893fc3494b731e825b33666733d6.1 > and filename that are all non-7bit. (probably in Japanese).
I think we should definitely *not* forbid this, and we should (at the very least) be working towards supporting the practice. It may be that we can't properly support this until we can guarantee a C.UTF-8 locale as a minimum available standard, but that sounds to me like another justification for such a locale. I think we should encourage the filename to be in a UTF-8 encoding, and even if upstream does use 8-bit filenames with a non-UTF-8 encoding I think that a Debian packager should be encouraged to patch that. I would also be OK with mandating that filenames should only be in either UTF-8 or the ASCII subset thereof, and that ISO-8859-* and other such restricted measures are not welcome on our filesystems. Regards, Andrew McMillan. -- ------------------------------------------------------------------------ andrew (AT) morphoss (DOT) com +64(272)DEBIAN If wishes were horses, then beggars would be thieves. ------------------------------------------------------------------------
signature.asc
Description: This is a digitally signed message part