On Wed, Jun 1, 2011 at 5:56 PM, Morten Kloster <mor...@gmail.com> wrote: > On Wed, Jun 1, 2011 at 1:35 AM, Johan Corveleyn <jcor...@gmail.com> wrote: >> On Tue, May 31, 2011 at 12:44 PM, Johan Corveleyn <jcor...@gmail.com> wrote: >> ... > [] >> I'll get into some more testing and reviewing tomorrow or the day >> after (unless someone else beats me to it :-)). >> >> Cheers, >> -- >> Johan >> > > I had trouble getting any reliable time differences between the > unpatched and patched versions. By putting the token counting > in a x300 loop, I got an increase in running time of ~10% for my > test file (a 40MB file - the output from all my merge.c diffs - > against a copy where I have changed the first and last line, as > you suggested)... that was with ignore eol, but not whitespace. > So here it looks like the patch barely increases running time at > all; almost all the time is spent reading the files and assigning > tokens.
Hi Morten, Sorry, it took me a little while longer than expected, but I finally got around to it. I did some more tests, and upon further investigation, I too can't really find a significant performance decrease because of the token counting. So that looks good. I've reviewed it all, and it looks fine. I just have one small question about the changes in lcs.c: [[[ @@ -218,11 +228,17 @@ prepend_lcs(svn_diff__lcs_t *lcs, apr_off_t lines, svn_diff__lcs_t * svn_diff__lcs(svn_diff__position_t *position_list1, /* pointer to tail (ring) */ svn_diff__position_t *position_list2, /* pointer to tail (ring) */ + svn_diff__token_index_t *token_counts_list1, /* array of counts */ + svn_diff__token_index_t *token_counts_list2, /* array of counts */ + svn_diff__token_index_t num_tokens, apr_off_t prefix_lines, apr_off_t suffix_lines, apr_pool_t *pool) { apr_off_t length[2]; + svn_diff__token_index_t *token_counts[2]; + svn_diff__token_index_t unique_count[2]; + svn_diff__token_index_t token_index; ... @@ -281,16 +310,17 @@ svn_diff__lcs(svn_diff__position_t *position_list1 sentinel_position[0].next = position_list1->next; position_list1->next = &sentinel_position[0]; sentinel_position[0].offset = position_list1->offset + 1; + token_counts[0] = token_counts_list1; sentinel_position[1].next = position_list2->next; position_list2->next = &sentinel_position[1]; sentinel_position[1].offset = position_list2->offset + 1; + token_counts[1] = token_counts_list2; ... @@ -310,12 +340,12 @@ svn_diff__lcs(svn_diff__position_t *position_list1 /* For k < 0, insertions are free */ for (k = (d < 0 ? d : 0) - p; k < 0; k++) { - svn_diff__snake(fp + k, &lcs_freelist, pool); + svn_diff__snake(fp + k, token_counts, &lcs_freelist, pool); } /* for k > 0, deletions are free */ for (k = (d > 0 ? d : 0) + p; k >= 0; k--) { - svn_diff__snake(fp + k, &lcs_freelist, pool); + svn_diff__snake(fp + k, token_counts, &lcs_freelist, pool); } p++; ]]] Why do you copy the token_counts_list parameters into the array token_counts[]? Is that to simplify the passing of data to __snake (keeping number of arguments to a minimum)? Can't you just pass the token_counts_list parameters 'as is' to __snake as two extra arguments, and lose the token_counts[] variable? But maybe you have a reason for this, because of compiler optimization or inlining or something like that? Apart from that I have no more remarks, so I think I'll be able to commit this soon. Thanks for your work. -- Johan