Re: Status of the branch diff-optimizations-bytes branch

Hyrum K Wright Thu, 27 Jan 2011 20:52:05 -0800

On Tue, Jan 25, 2011 at 8:31 PM, Johan Corveleyn <[email protected]> wrote:
> On Tue, Jan 25, 2011 at 4:58 PM, Hyrum K Wright <[email protected]> wrote:
>> Johan (and other interested parties),
>> I've been following some of the commits to the
>> diff-optimizations-branch with interest.  While I've not reviewed them
>> for technical merit, it appears that others have, and that there is
>> good work going on in the branch.
>
> Thanks for your interest.
>
> It might be good to get a little bit more review on the whole, I
> think. A lot of people have read parts of the code, but if I remember
> correctly most (if not all) of them have said things like "I haven't
> reviewed it in detail, but here are some suggestions, feedback, ...".
>
>> I'm wondering what the potential
>> for merging back to trunk is.  This is the TODO section from the
>> BRANCH-README:
>>
>> TODO:
>>
>>  * eliminate identical prefix                 [DONE]
>>  * eliminate identical suffix                 [DONE]
>>  * make diff3 and diff4 use datasources_open  [DONE]
>>     (this may eliminate the need for datasource_open, and the flag
>>      datasource_opened in token.c#svn_diff__get_tokens)
>>  * implement (part of) increment_pointers and
>>    decrement_pointers with a macro            [DONE]
>>     (offers another 10% speedup)
>>  * integrate (some of the) low-level optimizations for prefix/suffix
>>    scanning, proposed by stefan2 [2]          []
>>  * revv svn_diff.h#svn_diff_fns_t             []
>>
>> It looks like, for the most part, any destabilizing functionality is
>> completed, and what remains are simply optimizations.  This
>> optimization work can probably be performed on trunk, and if so, we
>> should merge the branch back and do the cleanup work there.  The only
>> bit on the current list of stuff is revving svn_diff_fns_t.  Can that
>> work be parallelized?
>
> Yes, you are correct. Most of the essential work is done. Further
> optimizations can just as well be done on trunk.
>
> Revving svn_diff_fns_t: what do you mean with parallelizing it? I must
> admit that I don't really know (yet) how to go about that. Very early
> during the branch work, danielsh pointed out that I modified this
> public struct (vtable for reading data from datasources), so it should
> be revved. Is it listed/documented somewhere what should be done for
> that (community guide perhaps)?
>
> (one slightly related note to this: I introduced the function
> datasources_open, to open the multiple datasources at once (as opposed
> to the original datasource_open, which only opens one datasource). But
> maybe the new function should be renamed to datasource_open_all or
> datasources_open_all or something, to make it stand out a little bit
> more).
>
>> I'm not trying to rush the work, nor do I think it is essential for
>> 1.7, but it feels like a pretty good performance increase, and one
>> that we shouldn't have any problem shipping.  (If there are currently
>> API uncertainties, than it would be good to wait until 1.7.x branches,
>> though.)
>
> Yes, I think it's quite ready to be merged, and could provide a very
> significant performance increase (for diff, merge and blame).
>
> The API is stable now, I think, except maybe for the name of the
> datasources_open function (see above). If we decide to go (for
> optimizations reasons) for specialized prefix/suffix scanning
> functions for 2, 3 or 4 datasources, I think it's best to keep the
> generic datasources_open API (with an array of datasources), and only
> split up between specialized versions in the implementation.


The API is now rev'd, and I caught the branch up with trunk in
r1064459.  So it looks like we're ready to merge!

Johan, would you like to do the honors?

-Hyrum

Re: Status of the branch diff-optimizations-bytes branch

Reply via email to