Erik Huelsmann wrote on Wed, Sep 15, 2010 at 23:20:06 +0200: > Yesterday, I was talking to CMike about our long-standing issue with UTF-8 > strings designating a certain path not neccessarily being equal to other > strings designating the same path. The issue has to do with NFC (composed) > and NFD (decomposed) representation of Unicode characters. CMike nicely > called the issue the "Erik Huelsmann issue" yesterday :-) > > The issue consists of two parts: > 1. The repository which should determine that paths being added by a commit > are unique, regardless of their encoding (NFC/NFD)
Will you assume that all paths in the repository have been Unicode-canonicalized prior to entering the repository? If yes, then we infer that no two in-repository paths (which are bytewise different) canonicalize to the same byte sequence. Which is pretty useful precondition to have, i.e., what /can/ svn do on a legacy repository where some two paths are bytewise-different yet Unicode-equal? > 2. The client which should detect that the pathnames coming in from the > filesystem may differ in encoding from what's in the working copy > administrative files [this is mainly an issue on the Mac: > http://subversion.tigris.org/issues/show_bug.cgi?id=2464] > ... > Basically what I was trying to do is: do what we do now (ie fail if the path > exists and succeed if it doesn't), with the only difference that the paths > used for comparison are guarenteed to be the same normalization - meaning > they are the same byte sequence when they're equal unicode.