Yesterday, I was talking to CMike about our long-standing issue with UTF-8 strings designating a certain path not neccessarily being equal to other strings designating the same path. The issue has to do with NFC (composed) and NFD (decomposed) representation of Unicode characters. CMike nicely called the issue the "Erik Huelsmann issue" yesterday :-)
The issue consists of two parts: 1. The repository which should determine that paths being added by a commit are unique, regardless of their encoding (NFC/NFD) 2. The client which should detect that the pathnames coming in from the filesystem may differ in encoding from what's in the working copy administrative files [this is mainly an issue on the Mac: http://subversion.tigris.org/issues/show_bug.cgi?id=2464] Mike, the thing I have been trying to find around our filesystem implementation is where an editor drive adding a path [add_directory() or add_file()] checks whether the file already exists. The check at that point should be encoding independent, for example by making all paths NFC (or NFD) before comparison. You could use utf8proc ( http://www.flexiguided.de/publications.utf8proc.en.html) to do the normalization - it's very light-weight in contrast to ICU which provides the same fuctionality, but has a much broader scope. The problem I was telling you about is that I was looking in libsvn_fs_base to find where the existence check is performed, but I couldn't find it. Basically what I was trying to do is: do what we do now (ie fail if the path exists and succeed if it doesn't), with the only difference that the paths used for comparison are guarenteed to be the same normalization - meaning they are the same byte sequence when they're equal unicode. Bye, Erik.