On Friday, June 6, 2014 4:22:22 AM UTC+5:30, Chris Angelico wrote: > On Fri, Jun 6, 2014 at 8:35 AM, Rustom Mody wrote: > > And then ask how Linux (in your and Stallman's sense) differs from > > Windows in how the filesystem handles things like filenames?
> What are you testing of the kernel? Most of the kernel doesn't > actually work with text at all - it works with integers, buffers of > memory (which could be seen as streams of bytes, but might be almost > anything), process tables, open file handles... but not usually text. > To you, "EAGAIN" might be a bit of text, but to the Linux kernel, it's > an integer (11 decimal, if I recall correctly). Is that some fancy new > form of encoding? :) | Thanks to the properties of UTF-8 encoding, the Linux kernel, the | innermost and lowest-level part of the operating system, can | handle Unicode filenames without even having the user tell it | that UTF-8 is to be used. All character strings, including | filenames, are treated by the kernel in such a way that THEY | APPEAR TO IT ONLY AS STRINGS OF BYTES. Thus, it doesn't care and | does not need to know whether a pair of consecutive bytes should | logically be treated as two characters or a single one. The only | risk of the kernel being fooled would be, for example, for a | filename to contain a multibyte Unicode character encoded in such | a way that one of the bytes used to represent it was a slash or | some other character that has a special meaning in file | names. Fortunately, as we noted, UTF-8 never uses ASCII | characters for encoding multibyte characters, so neither the | slash nor any other special character can appear as part of one | and therefore there is no risk associated with using Unicode in | filenames. | | Filesystems found on Microsoft Windows machines (NTFS and FAT) | are different in that THEY STORE FILENAMES ON DISK IN SOME | PARTICULAR ENCODING. The kernel must translate this encoding to | the system encoding, which will be UTF-8 in our case. | | If you have Windows partitions on your system, you will have to | take care that they are mounted with correct options. For FAT and | ISO9660 (used by CD-ROMs) partitions, option utf8 makes the | system translate the filesystem's character encoding to | UTF-8. For NTFS, nls=utf8 is the recommended option (utf8 should | also work). [Emphases mine] From: http://michal.kosmulski.org/computing/articles/linux-unicode.html -- https://mail.python.org/mailman/listinfo/python-list