2012/2/3 Daniel Shahaf <danie...@elego.de>: > Branko Čibej wrote on Thu, Feb 02, 2012 at 21:03:47 +0100: >> On 02.02.2012 20:22, Peter Samuelson wrote: >> > [Hiroaki Nakamura] >> >> In option (2), we do n12n on all clients on all platforms, and we >> >> include web_dav_svn in "clients". So we convert all input paths to >> >> the "server encoding", which is NFC. >> > Indeed. But the very concept of a "server encoding" means we are >> > involving the server side. Which invokes a lot of difficult questions >> > like "what about existing 1.x clients", "what about existing checkouts" >> > and "what about existing repositories". >> > >> > By proposing a client-only solution, I hope to avoid _all_ those >> > questions. >> >> Can't see how that works, unless you either make the client-side >> solution optional, create a mapping table, or make name lookup on the >> server agnostic to character representation. I can't envision how any of >> those solutions would work all the time. >> >> It would be nice if we could normalize paths in the repository without >> having to perform a dump/reload cycle, but I don't know how that would >> work in FSFS > > It won't. Changing the encoding increase the length (in bytes) of the > string (in the dirents hash, for example), and thus change the offsets > of the node-revs that are later in the file --- to which subsequent > revisions, and the id's of those node-revs, refer.
Changes from NFD to NFC does not increase the length. The length will be same or smaller, not larger. Here I quote from http://svn.apache.org/repos/asf/subversion/trunk/notes/unicode-composition-for-filenames > The proposed internal 'normal form' should be NFC, if only if > it were because it's the most compact form of the two: when > allocating memory to store a conversion result, it won't be > necessary (ever) to allocate more than the size of the input buffer. -- )Hiroaki Nakamura) hnaka...@gmail.com