On Mon, Mar 25, 2013 at 11:09 PM, Johan Corveleyn <jcor...@gmail.com> wrote: > On Mon, Mar 25, 2013 at 9:31 PM, Philip Martin > <philip.mar...@wandisco.com> wrote: >> Philip Martin <philip.mar...@wandisco.com> writes: >> >>> $ valgrind -q .libs/lt-diff-diff3-test 15 >>> ==6097== Invalid read of size 1 >>> ==6097== at 0x503FD83: find_identical_suffix (diff_file.c:586) >>> ==6097== by 0x5040C45: datasources_open (diff_file.c:815) >>> ==6097== by 0x503D6B2: svn_diff_diff3_2 (diff3.c:276) >>> ==6097== by 0x5041D3A: svn_diff_file_diff3_2 (diff_file.c:1327) >>> ==6097== by 0x401F2F: three_way_merge (diff-diff3-test.c:191) >>> ==6097== by 0x4027B7: two_way_diff (diff-diff3-test.c:311) >>> ==6097== by 0x405DF8: test_token_compare (diff-diff3-test.c:2589) >>> ==6097== by 0x4E34C6A: do_test_num (svn_test_main.c:268) >>> ==6097== by 0x4E35686: main (svn_test_main.c:551) >>> ==6097== Address 0x138585af is 1 bytes before a block of size 131,072 >>> alloc'd >>> ==6097== at 0x4C28BED: malloc (vg_replace_malloc.c:263) >>> ==6097== by 0x572DDDB: pool_alloc (apr_pools.c:1463) >>> ==6097== by 0x572DF57: apr_palloc_debug (apr_pools.c:1504) >>> ==6097== by 0x503FACA: find_identical_suffix (diff_file.c:558) >>> ==6097== by 0x5040C45: datasources_open (diff_file.c:815) >>> ==6097== by 0x503D6B2: svn_diff_diff3_2 (diff3.c:276) >>> ==6097== by 0x5041D3A: svn_diff_file_diff3_2 (diff_file.c:1327) >>> ==6097== by 0x401F2F: three_way_merge (diff-diff3-test.c:191) >>> ==6097== by 0x4027B7: two_way_diff (diff-diff3-test.c:311) >>> ==6097== by 0x405DF8: test_token_compare (diff-diff3-test.c:2589) >>> ==6097== by 0x4E34C6A: do_test_num (svn_test_main.c:268) >>> ==6097== by 0x4E35686: main (svn_test_main.c:551) >> >> On entry to find_identical_suffix file[0] is >> >> (gdb) p file[0] >> $1 = {path = 0x407ee6 "token-compare-original1", file = 0x127cea0, >> size = 131072, chunk = 0, buffer = 0x127cf50 '\n' <repeats 200 times>..., >> curp = 0x129cf48 " @@@\n\300\312&\001", endp = 0x129cf50 >> "\300\312&\001", >> normalize_state = svn_diff__normalize_state_normal, suffix_start_chunk = >> -1, >> suffix_offset_in_chunk = 0} >> >> The file data ends at the end of the chunk. The code does >> >> if (file_for_suffix[i].chunk == file[i].chunk) >> { >> /* Prefix ended in last chunk, so we can reuse the prefix buffer */ >> file_for_suffix[i].buffer = file[i].buffer; >> } >> else >> { >> /* There is at least more than 1 chunk, >> so allocate full chunk size buffer */ >> file_for_suffix[i].buffer = apr_palloc(pool, CHUNK_SIZE); >> SVN_ERR(read_chunk(file_for_suffix[i].file, >> file_for_suffix[i].path, >> file_for_suffix[i].buffer, length[i], >> chunk_to_offset(file_for_suffix[i].chunk), >> pool)); >> } >> file_for_suffix[i].endp = file_for_suffix[i].buffer + length[i]; >> file_for_suffix[i].curp = file_for_suffix[i].endp - 1; >> >> and allocates a new chunk so that file_for_suffix[0] is >> >> (gdb) p file_for_suffix[0] >> $3 = {path = 0x407ee6 "token-compare-original1", file = 0x127cea0, >> size = 131072, chunk = 1, buffer = 0x12bd0d0 'A' <repeats 200 times>..., >> curp = 0x12bd0cf "", endp = 0x12bd0d0 'A' <repeats 200 times>..., >> normalize_state = svn_diff__normalize_state_normal, suffix_start_chunk = 0, >> suffix_offset_in_chunk = 0} >> >> file_for_suffix[0=.endp is 1 byte before file_for_suffix[0].buffer and >> so dereferencing it is not valid. > > You mean file_for_suffix[0].curp, which is 1 byte before > file_for_suffix[0].buffer. Bummer, it's not supposed to do that. > >> How are the chunks supposed to work? When the file data ends at the end >> of a chunk is there supposed to be another valid chunk with no data? > > It's been a while, so I'm not sure. I *think* the intention is that > find_identical_suffix is not executed in this case, because in > datasources_open, find_identical_suffix is only executed if > (!reached_one_eof). And find_identical_prefix should have set > reached_one_eof in this case. I think ... But apparently it doesn't > ...
Hmmm, I should have looked closer. The case of file.size==131072 (which is CHUNK_SIZE, which is 1<<CHUNK_SHIFT, which is 17), or any chunk_size multiple, can happen with or without find_identical_prefix eating it all. The fact is that offset_to_chunk is defined as: #define offset_to_chunk(offset) ((offset) >> CHUNK_SHIFT) which means that offset_to_chunk(131072) is 1. And it just so happens that in multiple places in diff_file.c the "last chunk" is determined as offset_to_chunk(file.size). I think this ("last chunk being offset_to_chunk(file.size)") predates my involvement of working on the prefix/suffix scanning. So I think it's safe to say that: yes, it seems that there is supposed to be another chunk with no data. Perhaps someone else with longer involvement with the origins of the code of diff_file.c can confirm that this is indeed the intention. If that's indeed the case (there can be a last chunk with no data), I suppose it's best to fix find_identical_suffix to check for that. Sorry to not be any more help ... a bit overworked lately. -- Johan