On Mon, Jul 11, 2011 at 02:13:46PM +0100, Philip Martin wrote: > Stefan Sperling <s...@elego.de> writes: > > > We should allow invalid mergeinfo to exit the repository, e.g. in a dump > > file. But input paths, such as svnadmin load, should refuse invalid > > mergeinfo. > > We have a similar problem with invalid svn:log (non-LF line-ends). In > that case load fails with an error but sync corrects the value and > completes the load. Simply refusing to sync was deemed insufficient.
This was because it was known that there were many repositories out there with broken log messages. People had been committing invalid log messages for years using Eclipse or other clients that didn't normalise log messages properly. We will only start tolerating invalid mergeinfo in 1.7 (cf. issue #3896). If any invalid mergeinfo enters repositories used with 1.6.x servers and clients, things immediately start falling over left and right. There is no way not to notice this corruption (even simple updates can start failing -- check the commits Paul made for issue #3896 to see in how many places the 1.6.x client will start throwing fatal errors). The 1.7 client tolerates invalid mergeinfo so that users who run into this problem can upgrade their client to continue running operations that do not require valid mergeinfo, until corrupted mergeinfo has been fixed. But there are *zero* known cases of such corruption in the wild. Maybe it has already happened somewhere, but I have heard of no such reports. It requires some smart person misusing git-svn's --mergeinfo option, or some other custom client that sends invalid mergeinfo via the RA layer directly. Clients using the libsvn_client API cannot commit bad mergeinfo. > > Maybe we should provide a way to fix invalid mergeinfo in dump files. > > One existing but no officially supported method is svndumptool > > http://svn.borg.ch/svndumptool/ > > I've never used it, but I suppose it would work. Is that a good enough > solution? Invalid mergeinfo is harder to fix than an invalid log > message, because the validity rules are much more complex. Do we advise > users to delete it? To analyse the failure and tweak it to be valid? I would say proper analysis and tweaking is always preferred over deletion. I guess that in practice most users would resort to simply deleting invalid mergeinfo. But that only means that some merges will flag spurious conflicts. Which is not a bad trade-off at all since 1.6.x clients are basically unable to use the repository when faced with invalid mergeinfo. The proper approach also depends on the kind of corruption (e.g. a simple typo vs. garbage data).