I just want to remind everyone I’m *not* a file system’s engineer – I’m just 
trying to help Dave (and anyone else caught in this) make sure their app can 
find their files.

> On Mar 23, 2017, at 1:53 AM, Alastair Houghton <alast...@alastairs-place.net> 
> wrote:
> 
> On 22 Mar 2017, at 18:00, David Duncan <david.dun...@apple.com> wrote:
>> 
>> So there was another explanation posted on the bug that I’m not certain you 
>> got, but which I think may explain.
>> 
>> Basically the concept is that since APFS doesn’t normalize file names, if 
>> you store file names in some other storage (say in your preferences) then 
>> what could happen is this:
>> 
>> 10.2: File is saved with a file name handed to the file system in NFC form. 
>> File system converts the file name to NFD. You store it as NFC.
>> 10.3: File system is converted to APFS, and the file name is NFD. You try to 
>> look up the file as NFC, and it fails.
> 
> This is going to cause problems, though, when things migrate from HFS+ to 
> APFS, because the HFS normalisation *isn’t* a standard one.  In particular, 
> it certainly *isn’t* NFD for the current version of Unicode.

Yes, that is the crux of Dave’s issue – HFS+ => APFS only translated the file 
names (from UTF-16 to UTF-8), it did not re-normalize them.

> The only obvious solution for that would be to have the HFS+ to APFS 
> migration tool *re-normalise* the filenames (maybe it does?), but that’s 
> bound to break things in the (presumably quite common) case where the 
> filename stored in e.g. a plist was originally obtained from the filesystem.

Arguably there is no way for the file system converter to know how it should 
renormalize file names. This is akin to case sensitive vs case insensitive file 
systems. If you ran a converter from a case insensitive file system to a case 
sensitive one, you could preserve the capitalization during the conversion, but 
file lookups that used the wrong case would fail after the conversion. But the 
converter can’t know you want to look up “foo” via “FOO” or “Foo” to do any 
kind of normalization. The difference here is that for the most part unicode 
normalization is invisible to the developer.

> 
> Kind regards,
> 
> Alastair.
> 
> --
> http://alastairs-place.net
> 

--
David Duncan


_______________________________________________

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Reply via email to