Hi Eric, > Here's a sequence of steps, comparable to what I tried to use to generate > 14b8a31ef on the master branch (cherry-picked from branch-1.6). The > example below uses hashes instead of symbolic names, in case you want to > reproduce this even when the branches have advanced. Notice that it > appears to get stuck in an infinite loop, consuming 100% CPU with no > apparent progress after a while. > > $ git clone git://git.sv.gnu.org/m4.git > $ cd m4 > $ git config > $ git reset --hard 6033d89 > $ git config merge.merge-changelog.driver > git-merge-changelog --split-merged-entry %O %A %B > $ timeout 600 git cherry-pick cd172d9
Indeed, with these instructions, I reproduce. The guessed "direction", that is, the meaning of the three files, is right. The two major problems in your case are: - There are small differences between ChangeLog entries on the branches. For example, on one branch you have a ChangeLog entry that starts with Thu Nov 5 12:37:13 1992 Francois Pinard (pinard at icule) and on the other branch it starts with 1992-11-05 François Pinard <pin...@iro.umontreal.ca> These small differences cause the fuzzy string matching to be activated, and it takes a lot of time. Two exactly equal entries could be detected by a quick hash table lookup. - The branches are very far away: > $ ls -l ChangeLog .merge_file_* > -rw------- 1 eblake None 172482 Jun 30 21:24 .merge_file_X15QyE > -rw------- 1 eblake None 172697 Jun 30 21:24 .merge_file_aJFAU3 > -rw------- 1 eblake None 466111 Jun 30 21:24 .merge_file_acCLpE > -rw-r--r-- 1 eblake None 466111 Jun 30 19:15 ChangeLog You are merging from a branch that has only 1/3 of the entire development history. git-merge-changelog spends time looking for each of the other 2/3 of the ChangeLog file whether it can find some match. The attached change applies Ralf's improvement (originally for gettext's msgmerge program). With this, the time spent for the "git cherry-pick" command goes down: 516 sec -> 81 sec Since this is obviously still not the speed that you desire, I continue to look... 2009-07-02 Bruno Haible <br...@clisp.org> Speed up approximate search for matching ChangeLog entries. * lib/git-merge-changelog.c (entry_fstrcmp): Add a lower_bound argument. Call fstrcmp_bounded instead of fstrcmp. (compute_mapping, try_split_merged_entry, main): Update callers. --- lib/git-merge-changelog.c.orig 2009-07-02 23:32:51.000000000 +0200 +++ lib/git-merge-changelog.c 2009-07-02 22:48:51.000000000 +0200 @@ -211,9 +211,12 @@ /* Perform a fuzzy comparison of two ChangeLog entries. Return a similarity measure of the two entries, a value between 0 and 1. - 0 stands for very distinct, 1 for identical. */ + 0 stands for very distinct, 1 for identical. + If the result is < LOWER_BOUND, an arbitrary other value < LOWER_BOUND can + be returned. */ static double -entry_fstrcmp (const struct entry *entry1, const struct entry *entry2) +entry_fstrcmp (const struct entry *entry1, const struct entry *entry2, + double lower_bound) { /* fstrcmp works only on NUL terminated strings. */ char *memory; @@ -233,7 +236,8 @@ p += entry2->length; *p++ = '\0'; } - similarity = fstrcmp (memory, memory + entry1->length + 1); + similarity = + fstrcmp_bounded (memory, memory + entry1->length + 1, lower_bound); freea (memory); return similarity; } @@ -410,7 +414,8 @@ for (j = n2 - 1; j >= 0; j--) if (index_mapping_reverse[j] < 0) { - double similarity = entry_fstrcmp (entry_i, file2->entries[j]); + double similarity = + entry_fstrcmp (entry_i, file2->entries[j], best_j_similarity); if (similarity > best_j_similarity) { best_j = j; @@ -429,7 +434,8 @@ if (index_mapping[ii] < 0) { double similarity = - entry_fstrcmp (file1->entries[ii], entry_j); + entry_fstrcmp (file1->entries[ii], entry_j, + best_i_similarity); if (similarity > best_i_similarity) { best_i = i; @@ -729,7 +735,8 @@ new_body.string = new_entry->string + split_offset; new_body.length = new_entry->length - split_offset; - similarity = entry_fstrcmp (&old_body, &new_body); + similarity = + entry_fstrcmp (&old_body, &new_body, best_similarity); if (similarity > best_similarity) { best_split_offset = split_offset; @@ -1219,7 +1226,8 @@ size_t i; for (i = edit->i1 + 1; i <= edit->i2; i++) if (entry_fstrcmp (ancestor_file.entries[i], - modified_file.entries[i + edit->j2 - edit->i2]) + modified_file.entries[i + edit->j2 - edit->i2], + FSTRCMP_THRESHOLD) < FSTRCMP_THRESHOLD) { simple_merged = false; @@ -1300,7 +1308,8 @@ simple = true; for (i = edit->i1; i <= edit->i2; i++) if (entry_fstrcmp (ancestor_file.entries[i], - modified_file.entries[i + edit->j2 - edit->i2]) + modified_file.entries[i + edit->j2 - edit->i2], + FSTRCMP_THRESHOLD) < FSTRCMP_THRESHOLD) { simple = false;