On 09.11.2012 12:28, Thomas Åkesson wrote: > Today, I noticed that Branko started some implementation in a branch. Looks > like a collation based on utf8proc is in the making? I think that would make > a lot of sense because the ICU extension poses some challenges in the build > process and we might not need all that functionality that it provides.
Hi Thomas, Yes, I started a branch that's intended to fix the normalization problem. I selected utf8proc because we really don't need ICU (I can't see a serious need for language-specific case folding, for example, nor for Unicode regular expressions). Furthermore, utf8proc can be easily embedded into Subversion so it doesn't present another dependency that users would have to worry about. I'm currently doing the grunt work of implementing the collation (done) and the LIKE and GLOB operators that we'll need (in progress). The next, and biggest, step will be to review the client and WC libraries to make sure that paths sent to the server always come from the wc.db, not from disk. One open question is what to do about (historical) collisions in existing repositories, but I don't think that issue is important enough to resolve now. It'll take a while, but I hope to be able to finish the work in time for 1.8. If not ... well then, it'll be in 1.9. -- Brane -- Branko Čibej Director of Subversion | WANdisco | www.wandisco.com