At Wed, 27 Dec 2000 12:05:57 +0200,
Maxim Sobolev <[EMAIL PROTECTED]> wrote:
> Several days ago I got a CD with Russian filenames on it and discovered that
> I'm unable to read those filenames. After some hacking I produced a patch,

Vladimir Kushnir's patch will be for it.

http://www.freebsd.org/cgi/getmsg.cgi?fetch=270425+0+/usr/local/www/db/text/2000/freebsd-hackers/20001203.freebsd-hackers

and it is based on my patch:

http://triaez.kaisei.org/~mzaki/joliet/


> which should solve this problem in the manner similar to what we have in
> msdosfs module (i.e. user-provided conversion table). I have to emphasize that
> it's a temporary solution until we will have iconv support in kernel.

*PLEASE* be careful about filename I18N.

1. Joliet extension

The Joliet extension are built on Unicode basis,
and is the "multilingual" filesystem.
We can found CDs which contain files named by all of
English, French, Russian, Chinese, and Japanese languages.
So charset conversion per mount is not sufficient.

2. FAT

The current FAT filesystem use Unicode, however,
the FAT filesystem is not "multilingual" because of
local codepages used for the conventional 8.3 names.
Thus, per-mount codeset conversion are sufficient,
but additional codepage conversion is needed.
This conversion is currently achieved by 128bytes tables specified in
mount_msdos(8), but this way have no consideration about
multibyte codesets such as CJK.

3. Relation to userland applications

Currently, conversion table between Unicode and local charset are
widely needed and implemented, for such as the Joliet extension,
the FAT filesystem, TrueType rasterizers, WWW browsers, and so on.
We should share the tables as possible for their consintency.
So the ideal solution to code conversion are not in-kernel table
but userland shared library.
Therefore, filename code conversion should also be done in userland
as possible.

4. Rough idea of me

My preliminary idea to the filesystem I18N:

* filenames recorded on Unix filesystems (e.g. FFS, MFS) use
  an arbitrary codeset, for example Unicode.

* interface between kernel and userland should use 
  filesystem-safe encoding, for example UTF-8.

* userland applications can convert from/to the user-requested
  charsets, such as latin-2, koi8, and euc-jp, using shared library.

* the Joliet extension and UDF, which based on Unicode, need
  no in-kernel conversion, in case Unix filesystems use Unicode.

* the FAT filesystem, which use both Unicode and conventional
  codepages, requires in-kernel conversion in order to
  write the conventional 8.3 names.

Any ideas?

-- 
Motomichi Matsuzaki <[EMAIL PROTECTED]> 
Dept. of Biological Sciences, Grad. School of Science, Univ. of Tokyo, Japan 
  


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message
  • ... Maxim Sobolev
    • ... Андрей Чернов
      • ... Maxim Sobolev
        • ... Андрей Чернов
    • ... Motomichi Matsuzaki
      • ... Noriyuki Soda
      • ... Kenichi Okuyama
        • ... Noriyuki Soda
      • ... Maxim Sobolev
        • ... Motomichi Matsuzaki
          • ... Maxim Sobolev
            • ... Motomichi Matsuzaki
              • ... Maxim Sobolev
                • ... Michael C . Wu
                • ... Maxim Sobolev

Reply via email to