Re: [RFC] Non-normalizing Unicode Composition Awareness

Branko Čibej Fri, 09 Nov 2012 04:50:39 -0800

On 09.11.2012 12:28, Thomas Åkesson wrote:
> Today, I noticed that Branko started some implementation in a branch. Looks 
> like a collation based on utf8proc is in the making? I think that would make 
> a lot of sense because the ICU extension poses some challenges in the build 
> process and we might not need all that functionality that it provides.


Hi Thomas,

Yes, I started a branch that's intended to fix the normalization
problem. I selected utf8proc because we really don't need ICU (I can't
see a serious need for language-specific case folding, for example, nor
for Unicode regular expressions). Furthermore, utf8proc can be easily
embedded into Subversion so it doesn't present another dependency that
users would have to worry about.

I'm currently doing the grunt work of implementing the collation (done)
and the LIKE and GLOB operators that we'll need (in progress). The next,
and biggest, step will be to review the client and WC libraries to make
sure that paths sent to the server always come from the wc.db, not from
disk.

One open question is what to do about (historical) collisions in
existing repositories, but I don't think that issue is important enough
to resolve now.

It'll take a while, but I hope to be able to finish the work in time for
1.8. If not ... well then, it'll be in 1.9.

-- Brane

-- 
Branko Čibej
Director of Subversion | WANdisco | www.wandisco.com

Re: [RFC] Non-normalizing Unicode Composition Awareness

Reply via email to