This is from an email conversation with C. Michael Pilato about the performance degradation of 'svn log -g' involving revisions with plenty of changed paths.
BTW: as a workaround would it help to split a large commit into several smaller ones so that mergeinfo examination can be skipped for revisions that don't contain the log target? Regards, Thomas Email conversation follows: Thomas Becker wrote: > Is it true that svn log -g will examine all paths for merge-info affected in a > particular revision? From my layman point-of-view it would suffice to examine > just the path (and parent paths) for which the log was requested. Could you > shed some light on this, or is there any workaround? C. Michael Pilato wrote: > 'svn log' is by its very nature recursive. When you request the logs for a > directory, you are necessarily requesting them for all the children of the > directory, too. 'svn log -g' is no different, extending its search for > mergeinfo changes into any of the changed paths that are at or under the log > target itself. It's a bummer, to be sure, especially when big commits are in > the history. > > One thing I've considered in the past is the implementation of --depth=empty > support for 'svn log', which in the default case (and when run against a > directory target, of course) would show only revisions in which the > directory's properties changed (svn:ignore, svn:mergeinfo, etc.). In the > 'svn log -g' case, it would have the effect of only doing the mergeinfo > examination and recursion based on mergeinfo changes on the target directory > only, regardless of what might have changed "under" the directory. > > What do you think? Would this be useful? Thomas Becker wrote: > OK, I understand that a change in a path is also accounted to the parent of > that path. For this scenario it sure would be useful to be able to restrict > the scope of the log command. > > In our case however, I think it's a bit different: the performance of the log > of a path is affected by the number of "siblings" in any revision involved in > the log, e.g. > > r123455 > M /x/y/0000.txt > > r123456 > M /x/y/0000.txt > ... > M /x/y/9999.txt > > r123457 > M /x/y/0000.txt > > When requesting the log excluding revision 123456 (the one with many changed > paths) it performs fast (e.g. 'svn log -g -r 0:123455 <url>' or > 'svn log -g -r 123457:HEAD <url>' where <url> is > file:///C:/Repos/XY/x/y/0000.txt). Whenever revision 123456 is involved, > performance degrades for 'svn log -g'. This leads me to the conclusion that > 'svn log -g' examines all paths in r123456, even the "siblings" of 0000.txt > which should have no impact on the collection of mergeinfos for 0000.txt. But > maybe I'm totally wrong here? C. Michael Pilato wrote: > Ah! It is the case that the FS API for asking "Which paths were changed in > this revision?" are exhaustive -- you can't ask for just the changed paths in > and under some level. And generally speaking you have to iterate over those > paths, ruling out the ones that don't apply to find the ones that do. > > Now ... it occurs to me that there might be some optimization possible here. > I mean, if you're running log against a single file, and you haven't specified > -v, then perhaps we can do a more direct mergeinfo comparison. > Thomas Becker wrote: > This optimisation would be a great improvement for us as I suspect it wasn't > the last time that we applied a change to a lot of files in one revision (e.g. > change of file header comments or svn:keywords property). -- Software Development, Torex T: +49 (0)30 49901-0 E: thomas.bec...@torex.com Torex Retail Solutions GmbH, Salzufer 8, D-10587 Berlin T: +49 (0)30 49901-0 F: +49 (0)30 49901-139 www.torex.de