On Tue, Jan 25, 2011 at 8:31 PM, Johan Corveleyn <jcor...@gmail.com> wrote: > On Tue, Jan 25, 2011 at 4:58 PM, Hyrum K Wright <hy...@hyrumwright.org> wrote: >> Johan (and other interested parties), >> I've been following some of the commits to the >> diff-optimizations-branch with interest. While I've not reviewed them >> for technical merit, it appears that others have, and that there is >> good work going on in the branch. > > Thanks for your interest. > > It might be good to get a little bit more review on the whole, I > think. A lot of people have read parts of the code, but if I remember > correctly most (if not all) of them have said things like "I haven't > reviewed it in detail, but here are some suggestions, feedback, ...". > >> I'm wondering what the potential >> for merging back to trunk is. This is the TODO section from the >> BRANCH-README: >> >> TODO: >> >> * eliminate identical prefix [DONE] >> * eliminate identical suffix [DONE] >> * make diff3 and diff4 use datasources_open [DONE] >> (this may eliminate the need for datasource_open, and the flag >> datasource_opened in token.c#svn_diff__get_tokens) >> * implement (part of) increment_pointers and >> decrement_pointers with a macro [DONE] >> (offers another 10% speedup) >> * integrate (some of the) low-level optimizations for prefix/suffix >> scanning, proposed by stefan2 [2] [] >> * revv svn_diff.h#svn_diff_fns_t [] >> >> It looks like, for the most part, any destabilizing functionality is >> completed, and what remains are simply optimizations. This >> optimization work can probably be performed on trunk, and if so, we >> should merge the branch back and do the cleanup work there. The only >> bit on the current list of stuff is revving svn_diff_fns_t. Can that >> work be parallelized? > > Yes, you are correct. Most of the essential work is done. Further > optimizations can just as well be done on trunk. > > Revving svn_diff_fns_t: what do you mean with parallelizing it? I must > admit that I don't really know (yet) how to go about that. Very early > during the branch work, danielsh pointed out that I modified this > public struct (vtable for reading data from datasources), so it should > be revved. Is it listed/documented somewhere what should be done for > that (community guide perhaps)? > > (one slightly related note to this: I introduced the function > datasources_open, to open the multiple datasources at once (as opposed > to the original datasource_open, which only opens one datasource). But > maybe the new function should be renamed to datasource_open_all or > datasources_open_all or something, to make it stand out a little bit > more). > >> I'm not trying to rush the work, nor do I think it is essential for >> 1.7, but it feels like a pretty good performance increase, and one >> that we shouldn't have any problem shipping. (If there are currently >> API uncertainties, than it would be good to wait until 1.7.x branches, >> though.) > > Yes, I think it's quite ready to be merged, and could provide a very > significant performance increase (for diff, merge and blame). > > The API is stable now, I think, except maybe for the name of the > datasources_open function (see above). If we decide to go (for > optimizations reasons) for specialized prefix/suffix scanning > functions for 2, 3 or 4 datasources, I think it's best to keep the > generic datasources_open API (with an array of datasources), and only > split up between specialized versions in the implementation.
The API is now rev'd, and I caught the branch up with trunk in r1064459. So it looks like we're ready to merge! Johan, would you like to do the honors? -Hyrum