> On Mar 23, 2017, at 3:50 AM, Alastair Houghton <alast...@alastairs-place.net> 
> wrote:
> 
> On 22 Mar 2017, at 19:13, Chris Ridd <chrisr...@mac.com 
> <mailto:chrisr...@mac.com>> wrote:
>> 
>>> On 22 Mar 2017, at 09:05, Alastair Houghton <alast...@alastairs-place.net 
>>> <mailto:alast...@alastairs-place.net>> wrote:
>>> 
>>> In the context of filesystems (and specifically filenames), the phrases 
>>> “bag of bytes” and “bunch of bytes” have a fairly specific meaning.  The 
>>> point is that the filesystem doesn’t inspect the bytes it’s given, and 
>>> doesn’t care what they represent (about the only exception is that it 
>>> probably doesn’t support embedded NULs).  It isn’t suggesting that the 
>>> names are treated as an unordered set of bytes (that’d just be silly).  
>>> It’s just expressing the fact that the filesystem doesn’t care what they 
>>> are - it may compare them, and if it does so, it will use binary ordering 
>>> (not some other collation sequence) and won’t worry about things like case 
>>> or encoding at all.
>> 
>> That doesn’t sound sensible at all. It means you can create a filename with 
>> a byte sequence that isn’t valid UTF-8 and which likely then cannot be 
>> accessed by MacOS/iOS processes.
> 
> That isn’t possible on macOS - there’s a percent escaping mechanism built in 
> to the kernel to prevent this problem.
> 
>> It means that you could create multiple files with the “same" name, and that 
>> doesn’t sound like a win either. e.g. Aandi’s examples of LATIN SMALL LETTER 
>> E (U+0065)
>> COMBINING ACUTE ACCENT (U+0301) and LATIN SMALL LETTER E WITH ACUTE (U+00E9)
> 
> Yes, it does.
> 
>> How can a “next gen” filesystem avoid using Unicode rules when handling 
>> filenames?
> 
> Well, if I had designed it, it wouldn’t.  But I didn’t.
> 
> To be fair, I can see arguments in favour of the bunch of bytes approach; the 
> existing approach has created a problem in HFS+, in that the normalisation is 
> essentially fixed for all time, and doesn’t correspond to the current version 
> of Unicode.  It’s actually worse than it might be, because (IIRC) they fixed 
> the normalisation *before* Unicode adopted a stability policy for 
> normalisation...
> 
> But if the filesystem (or kernel) isn’t doing it, then IMO the Cocoa 
> frameworks certainly should.

Shouldn’t the VFS layer actually be doing this? It is part of its whole raison 
d’être, no? Just have -[NSURL fileSystemRepresentation] normalize things 
according to the correct Unicode rules, and let the VFS layer translate that to 
HFS+’s normalization style when dealing with HFS+.

Charles

_______________________________________________

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Reply via email to