At Wed, 27 Dec 2000 12:05:57 +0200, Maxim Sobolev <[EMAIL PROTECTED]> wrote: > Several days ago I got a CD with Russian filenames on it and discovered that > I'm unable to read those filenames. After some hacking I produced a patch, Vladimir Kushnir's patch will be for it. http://www.freebsd.org/cgi/getmsg.cgi?fetch=270425+0+/usr/local/www/db/text/2000/freebsd-hackers/20001203.freebsd-hackers and it is based on my patch: http://triaez.kaisei.org/~mzaki/joliet/ > which should solve this problem in the manner similar to what we have in > msdosfs module (i.e. user-provided conversion table). I have to emphasize that > it's a temporary solution until we will have iconv support in kernel. *PLEASE* be careful about filename I18N. 1. Joliet extension The Joliet extension are built on Unicode basis, and is the "multilingual" filesystem. We can found CDs which contain files named by all of English, French, Russian, Chinese, and Japanese languages. So charset conversion per mount is not sufficient. 2. FAT The current FAT filesystem use Unicode, however, the FAT filesystem is not "multilingual" because of local codepages used for the conventional 8.3 names. Thus, per-mount codeset conversion are sufficient, but additional codepage conversion is needed. This conversion is currently achieved by 128bytes tables specified in mount_msdos(8), but this way have no consideration about multibyte codesets such as CJK. 3. Relation to userland applications Currently, conversion table between Unicode and local charset are widely needed and implemented, for such as the Joliet extension, the FAT filesystem, TrueType rasterizers, WWW browsers, and so on. We should share the tables as possible for their consintency. So the ideal solution to code conversion are not in-kernel table but userland shared library. Therefore, filename code conversion should also be done in userland as possible. 4. Rough idea of me My preliminary idea to the filesystem I18N: * filenames recorded on Unix filesystems (e.g. FFS, MFS) use an arbitrary codeset, for example Unicode. * interface between kernel and userland should use filesystem-safe encoding, for example UTF-8. * userland applications can convert from/to the user-requested charsets, such as latin-2, koi8, and euc-jp, using shared library. * the Joliet extension and UDF, which based on Unicode, need no in-kernel conversion, in case Unix filesystems use Unicode. * the FAT filesystem, which use both Unicode and conventional codepages, requires in-kernel conversion in order to write the conventional 8.3 names. Any ideas? -- Motomichi Matsuzaki <[EMAIL PROTECTED]> Dept. of Biological Sciences, Grad. School of Science, Univ. of Tokyo, Japan To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message