Skip searching for better indentation heuristics if we'd slide a hunk more
than its size. This is the easiest fix proposed in the analysis[1] in
response to a patch that mercurial took for xdiff to limit searching
by a constant. Using a performance test as:
#!python
open('a', 'w').write(" \n" * 1000000)
open('b', 'w').write(" \n" * 1000001)
This patch reduces the execution of "git diff --no-index a b" from
0.70s to 0.31s. However limiting the sliding to the size of the diff hunk,
which was proposed as a solution (that I found easiest to implement for
now) is not optimal for cases like
open('a', 'w').write(" \n" * 1000000)
open('b', 'w').write(" \n" * 2000000)
as then we'd still slide 1000000 times.
In addition to limiting the sliding to size of the hunk, also limit by a
constant. Choose 100 lines as the constant as that fits more than a screen,
which really means that the diff sliding is probably not providing a lot
of benefit anyway.
[1]
https://public-inbox.org/git/[email protected]/
Reported-by: Jun Wu <[email protected]>
Analysis-by: Michael Haggerty <[email protected]>
Signed-off-by: Stefan Beller <[email protected]>
---
> Plus, it should always give the right answer.
I was tempted to do just that, but I caved. The diff is correct,
and the hunk sliding is purely to appease the visual aspect of
humans looking at diffs. If your diff can slide more than a
monitor height, you're not interested in the best slided diff,
but something else is going on.
> There are cases where it will be
> more expensive than a fixed `MAX_BORING`, but I bet on average it will
> be faster.
So I did both, settling for performance as the utmost desire. ;-)
xdiff/xdiffi.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
index 0de1ef463bf..91e98ee9869 100644
--- a/xdiff/xdiffi.c
+++ b/xdiff/xdiffi.c
@@ -591,6 +591,11 @@ static void measure_split(const xdfile_t *xdf, long split,
*/
#define INDENT_WEIGHT 60
+/*
+ * How far do we slide a hunk at most?
+ */
+#define INDENT_HEURISTIC_MAX_SLIDING 100
+
/*
* Compute a badness score for the hypothetical split whose measurements are
* stored in m. The weight factors were determined empirically using the tools
and
@@ -903,7 +908,12 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long
flags) {
long shift, best_shift = -1;
struct split_score best_score;
- for (shift = earliest_end; shift <= g.end; shift++) {
+ shift = earliest_end;
+ if (g.end - groupsize - 1 > shift)
+ shift = g.end - groupsize - 1;
+ if (g.end - INDENT_HEURISTIC_MAX_SLIDING > shift)
+ shift = g.end - INDENT_HEURISTIC_MAX_SLIDING;
+ for (; shift <= g.end; shift++) {
struct split_measurement m;
struct split_score score = {0, 0};
--
2.18.0.345.g5c9ce644c3-goog