Hiroaki Nakamura wrote on Fri, Feb 03, 2012 at 05:33:02 +0900: > 2012/2/3 Daniel Shahaf <danie...@elego.de>: > > Branko Čibej wrote on Thu, Feb 02, 2012 at 21:03:47 +0100: > >> On 02.02.2012 20:22, Peter Samuelson wrote: > >> > [Hiroaki Nakamura] > >> >> In option (2), we do n12n on all clients on all platforms, and we > >> >> include web_dav_svn in "clients". So we convert all input paths to > >> >> the "server encoding", which is NFC. > >> > Indeed. But the very concept of a "server encoding" means we are > >> > involving the server side. Which invokes a lot of difficult questions > >> > like "what about existing 1.x clients", "what about existing checkouts" > >> > and "what about existing repositories". > >> > > >> > By proposing a client-only solution, I hope to avoid _all_ those > >> > questions. > >> > >> Can't see how that works, unless you either make the client-side > >> solution optional, create a mapping table, or make name lookup on the > >> server agnostic to character representation. I can't envision how any of > >> those solutions would work all the time. > >> > >> It would be nice if we could normalize paths in the repository without > >> having to perform a dump/reload cycle, but I don't know how that would > >> work in FSFS > > > > It won't. Changing the encoding increase the length (in bytes) of the > > string (in the dirents hash, for example), and thus change the offsets > > of the node-revs that are later in the file --- to which subsequent > > revisions, and the id's of those node-revs, refer. > > Changes from NFD to NFC does not increase the length. > The length will be same or smaller, not larger. >
If the conversion is guaranteed to be monotone non-increasing (in length) then I believe could be made to work "in place". As to keeping concurrent readers and preexisting working copies sane --- for now I'm LAAEFTR'ing that. > Here I quote from > http://svn.apache.org/repos/asf/subversion/trunk/notes/unicode-composition-for-filenames > > The proposed internal 'normal form' should be NFC, if only if > > it were because it's the most compact form of the two: when > > allocating memory to store a conversion result, it won't be > > necessary (ever) to allocate more than the size of the input buffer. > > > -- > )Hiroaki Nakamura) hnaka...@gmail.com