> 693                 if (dynxdfs[1].size() >= window_size) {
> 694                     do_diff();
> (gdb)
> 695                     co_yield get_diff_for(l1, l2);
> 696                     // if l1 or l2 overflows this likely means that file 
> 1 was exhausted while there were still matching values in file 2 for some 
> reason. maybe they weren't passed through nreff?
> 697                     while (l1) {
> 698                         dynxdfs[0].consume();
> 699                         -- l1;
> 700                     }

so i'm in the first consume call, and the first thing i'm noticing is that, 
because only one line has been processed, the current document diff has 
deletions (changes, it has a nice general representation where deletions and 
additions are the same), marked after the first line

(gdb) p xdf->rchg[0]
$7 = 0 '\000'
(gdb) p xdf->rchg[1]
$8 = 1 '\001'
(gdb) p xdf->rchg[2]
$9 = 1 '\001'

this is the same as the first incorrect output that was generated, 1 equality 
and 2 deletions.
lemme look at the other document

(gdb) up
#1  0x0000555555611ee0 in 
AsymmetricStreamingXDiff::diff(_ZN24AsymmetricStreamingXDiff4diffEN4zinc9generatorISt17basic_string_viewIcSt11char_traitsIcEES5_NS0_17use_allocator_argEEE.Frame
 *) (frame_ptr=0x5120000001c0) at diff_xdiff.cpp:698
698                         dynxdfs[0].consume();
(gdb) p dynxdfs[1].xdf->nrec
$11 = 1
(gdb) p dynxdfs[1].rchg[0]
$12 = (char &) @0x502000000290: 0 '\000'

the second document has only one line and it's equal (although i'd have to 
check nreff and rindex and such to know for sure, it's likely)

(gdb) p l1
$13 = 1
(gdb) p l2
$14 = 1

and only 1 line has been processed from each document

so i kind of expect this to go well. i'm curious what variables like these look 
like before the next do_diff . i'll continue to do_diff's prolog.

(gdb) break do_diff
Note: breakpoints 1 and 8 also set at pc 0x55555563b3c3.
Breakpoint 9 at 0x55555563b3c3: file diff_xdiff.cpp, line 841.
(gdb) cont
Continuing.

Breakpoint 3, DynamicXDFile::consume (this=0x7ffff4e00c80, lines=1) at 
diff_xdiff.cpp:457
457             xdf->dstart += lines;
(gdb) cont
Continuing.
consuming rec ptr 0x502000000250
WINDOW RESIZE2=>4

Breakpoint 1, AsymmetricStreamingXDiff::do_diff (this=0x7ffff4e00a30) at 
diff_xdiff.cpp:841
841             auto xe = &this->xe;

ok
ummm check variables

i tried to call trace_state but it didn't work:
(gdb) p dynxdfs[0].trace_state("0"), dynxdfs[1].trace_state("1")
Cannot resolve method DynamicXDFile::trace_state to any overloaded instance
(gdb) p dynxdfs[0].trace_state(std::string_view("0",1)), 
dynxdfs[1].trace_state(std::string_view("1",1))
A syntax error in expression, near `"0",1)), 
dynxdfs[1].trace_state(std::string_view("1",1))'.
(gdb) p dynxdfs[0].trace_state(string_view("0",1)), 
dynxdfs[1].trace_state(string_view("1",1))
A syntax error in expression, near `("0",1)), 
dynxdfs[1].trace_state(string_view("1",1))'.

i found the bug though below :D
(gdb) p xe->xdf1.nreff
$15 = 2
(gdb) p xe->xdf2.nreff
$16 = 1
(gdb) p xe->xdf1.nrec
$17 = 2
(gdb) p xe->xdf2.nrec
$18 = 1
(gdb) p xe->xdf1.rec[0].ptr
There is no member named rec.
(gdb) p xe->xdf1.recs[0].ptr
$19 = 0x7ffff4e00a02 "b\nc\n"
(gdb) p xe->xdf2.recs[0].ptr
$20 = 0x502000000312 "b\n"
(gdb) p xe->xdf1.recs[0].ha
$21 = 1
(gdb) p xe->xdf1.recs[1].ha
$22 = 2

oooooooops nope i didn't at all! i thought i was checking xdf2 but instead i 
checked line 2 of xdf1 >(

(gdb) p xe->xdf2.recs[0].ha
$23 = 1

so line 1 of both files have the same hash index. and all the lines are in 
nreff ...

(gdb) p xe->xdf1.rindex[0]
$24 = 0
(gdb) p xe->xdf2.rindex[0]
$25 = 0

a new diff should count them as equal, unchanged >( shouldn't it??

(gdb) p xe->xdf1.rchg[0]
$26 = 1 '\001'
(gdb) p xe->xdf2.rchg[0]
$27 = 0 '\000'

it is notable that file 1 still has a change marked for the line, from before 
it was in the window.

this algorithm, from xdiff and git, wasn't designed to be used with a window -- 
usually when rchg is set it is because of a preprocessing phase that eliminates 
lines that can be processed quickly. however, which ones are removed are 
tracked with rindex and nreff and ha ... hey! i should check the ha array

(gdb) p xe->xdf1.ha[0]
$28 = 1
(gdb) p xe->xdf2.ha[0]
$29 = 1

ok, so the lines are tracked as having the same hash index in the preprocessed 
ha lists too.

so i dunno why this is happening, and i'm wondering if it could be because of 
some check in the diff implementation that skips setting rchg if it already is. 
this codebase has had many people working on it over the years and different 
approaches to things have accumulated.

but now i get to step into the git xdiff code and see !!

Reply via email to