Re: Let's discuss about unicode compositions for filenames!

Hiroaki Nakamura Thu, 02 Feb 2012 12:33:33 -0800

2012/2/3 Daniel Shahaf <[email protected]>:
> Branko Čibej wrote on Thu, Feb 02, 2012 at 21:03:47 +0100:
>> On 02.02.2012 20:22, Peter Samuelson wrote:
>> > [Hiroaki Nakamura]
>> >> In option (2), we do n12n on all clients on all platforms, and we
>> >> include web_dav_svn in "clients". So we convert all input paths to
>> >> the "server encoding", which is NFC.
>> > Indeed.  But the very concept of a "server encoding" means we are
>> > involving the server side.  Which invokes a lot of difficult questions
>> > like "what about existing 1.x clients", "what about existing checkouts"
>> > and "what about existing repositories".
>> >
>> > By proposing a client-only solution, I hope to avoid _all_ those
>> > questions.
>>
>> Can't see how that works, unless you either make the client-side
>> solution optional, create a mapping table, or make name lookup on the
>> server agnostic to character representation. I can't envision how any of
>> those solutions would work all the time.
>>
>> It would be nice if we could normalize paths in the repository without
>> having to perform a dump/reload cycle, but I don't know how that would
>> work in FSFS
>
> It won't.  Changing the encoding increase the length (in bytes) of the
> string (in the dirents hash, for example), and thus change the offsets
> of the node-revs that are later in the file --- to which subsequent
> revisions, and the id's of those node-revs, refer.


Changes from NFD to NFC does not increase the length.
The length will be same or smaller, not larger.

Here I quote from
http://svn.apache.org/repos/asf/subversion/trunk/notes/unicode-composition-for-filenames
  > The proposed internal 'normal form' should be NFC, if only if
  > it were because it's the most compact form of the two:  when
  > allocating memory to store a conversion result, it won't be
  > necessary (ever) to allocate more than the size of the input buffer.


-- 
)Hiroaki Nakamura) [email protected]

Re: Let's discuss about unicode compositions for filenames!

Reply via email to