> On Mar 23, 2017, at 3:50 AM, Alastair Houghton <alast...@alastairs-place.net> > wrote: > > On 22 Mar 2017, at 19:13, Chris Ridd <chrisr...@mac.com > <mailto:chrisr...@mac.com>> wrote: >> >>> On 22 Mar 2017, at 09:05, Alastair Houghton <alast...@alastairs-place.net >>> <mailto:alast...@alastairs-place.net>> wrote: >>> >>> In the context of filesystems (and specifically filenames), the phrases >>> “bag of bytes” and “bunch of bytes” have a fairly specific meaning. The >>> point is that the filesystem doesn’t inspect the bytes it’s given, and >>> doesn’t care what they represent (about the only exception is that it >>> probably doesn’t support embedded NULs). It isn’t suggesting that the >>> names are treated as an unordered set of bytes (that’d just be silly). >>> It’s just expressing the fact that the filesystem doesn’t care what they >>> are - it may compare them, and if it does so, it will use binary ordering >>> (not some other collation sequence) and won’t worry about things like case >>> or encoding at all. >> >> That doesn’t sound sensible at all. It means you can create a filename with >> a byte sequence that isn’t valid UTF-8 and which likely then cannot be >> accessed by MacOS/iOS processes. > > That isn’t possible on macOS - there’s a percent escaping mechanism built in > to the kernel to prevent this problem. > >> It means that you could create multiple files with the “same" name, and that >> doesn’t sound like a win either. e.g. Aandi’s examples of LATIN SMALL LETTER >> E (U+0065) >> COMBINING ACUTE ACCENT (U+0301) and LATIN SMALL LETTER E WITH ACUTE (U+00E9) > > Yes, it does. > >> How can a “next gen” filesystem avoid using Unicode rules when handling >> filenames? > > Well, if I had designed it, it wouldn’t. But I didn’t. > > To be fair, I can see arguments in favour of the bunch of bytes approach; the > existing approach has created a problem in HFS+, in that the normalisation is > essentially fixed for all time, and doesn’t correspond to the current version > of Unicode. It’s actually worse than it might be, because (IIRC) they fixed > the normalisation *before* Unicode adopted a stability policy for > normalisation... > > But if the filesystem (or kernel) isn’t doing it, then IMO the Cocoa > frameworks certainly should.
Shouldn’t the VFS layer actually be doing this? It is part of its whole raison d’être, no? Just have -[NSURL fileSystemRepresentation] normalize things according to the correct Unicode rules, and let the VFS layer translate that to HFS+’s normalization style when dealing with HFS+. Charles _______________________________________________ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com