While investigating the newly opened issue #3830[1], I found myself very quickly running into wall after wall with the approach I'd hoped to take. The more I thought about and attempted to classify the problems I was seeing, the more something became apparent to me: the svn_repos_get_logsX() API claims to do one thing, but takes an approach that's simply incompatible with that claim.
Let's take a look at the first part of that function's docstring: /** * Invoke @a receiver with @a receiver_baton on each log message from * @a start to @a end in @a repos's filesystem. @a start may be greater * or less than @a end; this just controls whether the log messages are * processed in descending or ascending revision number order. * * If @a start or @a end is #SVN_INVALID_REVNUM, it defaults to youngest. * * If @a paths is non-NULL and has one or more elements, then only show * revisions in which at least one of @a paths was changed (i.e., if * file, text or props changed; if dir, props or entries changed or any node * changed below it). Each path is a <tt>const char *</tt> representing * an absolute path in the repository. If @a paths is NULL or empty, * show all revisions regardless of what paths were changed in those * revisions. [...] Reading this, you'd be led as a developer to imagine a fairly straightforward implementation: for rev in [start .. end]: if paths: for path in paths: if path_changed_in_rev(path, rev): send_log(rev) break else: send_log(rev) In other words, loop over the requested range, transmitting logs for any revision in the range in which any of the specified paths was modified. If no paths are specified, send all the revisions in the range. I mean, that's basically what the docstring says it will do, right? But that's not what happens, at least when paths are provided. Instead, the implementation tries to simultaneously walk the histories of each path@MAX(start,end), transmitting logs for each interesting revision within the requested range and in the union of all such interesting revisions for those lines of history. These paths may weave around into different locations (through copy history) over time, even to locations that are in the original set of paths, but the function just keeps churning out revision logs. Most of the time -- where copies/moves aren't present -- this results in information similar to what you'd expect based on the docstring. But there are key differences: 1. The API talks exclusively about paths, not "lines of history". Thus, if multiple lines of history happen to occupy the same path at different times in the course of the requested revision range, only the youngest of those will get reported on. 2. Because the implementation is really tied more to lines of history (which are necessarily precise) than to revisions with path filtering, the implementation is susceptible to failures due to requests for history locations at the range extremes that don't exist. (This is the crux of issue #3830.) I guess I'm just surprised that we've made it this far without really noticing this disparity before. We *do* have an old issue #928[2] in which there's a request for a mode for 'svn log' which behaves in the way you'd *expect*, based on the docstring, svn_repos_get_logsX() to behave today. I'm not precisely sure what I'm going to do, or suggest be done, about this situation. At a minimum, I think the docstring for svn_repos_get_logs4() should be revised to reveal reality. But beyond that, and specifically toward trying to honor the requests in the two mentioned issues, I dunno. I just thought I'd share. :-) -- C-Mike [1] http://subversion.tigris.org/issues/show_bug.cgi?id=3830 [2] http://subversion.tigris.org/issues/show_bug.cgi?id=928 -- C. Michael Pilato <cmpil...@collab.net> CollabNet <> www.collab.net <> Distributed Development On Demand
signature.asc
Description: OpenPGP digital signature