Garret Wilson <gar...@globalmentor.com> writes: > On 1/23/2012 10:38 AM, Philip Martin wrote: >> Garret Wilson<gar...@globalmentor.com> writes: >> >>> On 1/23/2012 9:55 AM, Philip Martin wrote: >>>> I thought you were proposing to write the code? >>> I'm fine with that as well. Looks like I would have to add a few lines >>> to decote UTF-8 (surely such code already exists in the Subversion >>> codebase somewhere) and change a few if(...){} statements. I should be >>> able to handle that. I would imagine it will take more effort on my >>> part to get permission to change the code than actually writing the >>> code itself. >> The function receives a string of bytes, I think it's already in UTF-8. >> The problem is that while Subversion has functions to validate UTF-8 it >> doesn't have a system for extracting individual UTF-8 code points. At >> present it only ever needs to extract the ASCII subset which is trivial. > > Ah. Well, like I said---I would be happy to write the UTF-8 extraction > code. It would be worth it to me to get this functionality in; it > would be a fun exercise for me; it would be a good introduction to the > codebase for me; it's a small (very small), low-risk task; and the > Subversion codebase would be better off in the end. (I'm sure it can > be used elsewhere.) It's a win-win for everyone! :D > > This is really a small thing. Here's an example in just a few lines: > http://bjoern.hoehrmann.de/utf-8/decoder/dfa/ > > Or see DecodeUTF8BytesToChar at > tidy.sourceforge.net/cgi-bin/lxr/source/src/utf8.c .
Subversion already has UTF-8 code: http://svn.apache.org/repos/asf/subversion/trunk/subversion/include/private/svn_utf_private.h http://svn.apache.org/repos/asf/subversion/trunk/subversion/libsvn_subr/utf_validate.c but it needs an API to extract code-points. The situation is that the low level svn_fs.h API allows property names to be any null-terminated C string. The intermediate svn_ra.h API imposes restrictions because only XML names can be marshalled over http:, I think svn: allows anything. The high level svn_client.h API restricts names to a subset of ASCII and thus avoids passing anything the RA layers cannot handle. You want to relax the svn_client.h API to allow XML names. Strictly speaking I suppose a 3rd party RA implementation might only support the svn_client.h subset, but I don't know of any other RA implementations. -- uberSVN: Apache Subversion Made Easy http://www.uberSVN.com