On Sun, Sep 12, 2010 at 1:49 PM, Daniel Shahaf <d...@daniel.shahaf.name> wrote: >> When I was working on my changes, I was looking for a "to UTF-8" >> function that would return whether it actually re-encoded the input >> string, but did not find one. The re-encoding function that I used, >> `svn_subst_translate_string`, actually converts line endings to LF as >> well as re-encodes from the given encoding to UTF-8, but it does not >> inform the caller of whether it took either action. I guess that I >> could write a utility function, kind of like a `strcmp`, but which >> ignores any differences at line endings. Unfortunately, this adds >> another scan through every property value that is encountered. Already >> there is a noticeable decrease in the performance of the modified >> `svnsync` as a result of calling `svn_subst_translate_string` on >> basically every property value, and adding an additional scan through >> each property value would decrease performance further. >> > > Or you could insert the reencoding magic after (and separately from) the > dos2unix magic, if that would make counting easier. That said, what are > you trying to count? The number of properties where the reencoding > wasn't a noop?
To re-encode and then normalize the line endings would work. Unfortunately, I didn't see a library function that only performed the re-encoding; `svn_subst_translate_string` does both simultaneously. I removed the normalization counting code without much thought in my hastened efforts to produce a version of `svnsync` that I could use to mirror the GNU Nano repository. Currently, I am thinking that Stefan Sperling's idea of a `svn_subst_translate_string2` function is the way to go. > Re performance, isn't svnsync bound by network speed? Mostly yes. However, I have definitely noticed a decrease in performance with my altered version (when using --source-encoding) that cannot be explained by network speed. Granted, it's not that much of a difference. > Unrelatedly, you mentioned that in the repository you work on there are > soem properties in latin1 and some in utf8. So one will need (until > they fix the properties on their side) to svnsync a few revisions with > translation enabled, then kill svnsync and restart with translation > disabled, then restart again with it enabled etc. Which makes me think, > do we want a "sync up to N revisions" (or, "sync up to rN") switch? It's like you are reading my mind :) I figured that I would work on getting this change implemented and then work on such a feature. > Have you sent a new version of the patch yet? Oh, not yet. I'm still working on it.