Re: Diff suffix scanning invalid read

Johan Corveleyn Mon, 25 Mar 2013 15:40:55 -0700

On Mon, Mar 25, 2013 at 11:09 PM, Johan Corveleyn <jcor...@gmail.com> wrote:
> On Mon, Mar 25, 2013 at 9:31 PM, Philip Martin
> <philip.mar...@wandisco.com> wrote:
>> Philip Martin <philip.mar...@wandisco.com> writes:
>>
>>> $ valgrind -q .libs/lt-diff-diff3-test 15
>>> ==6097== Invalid read of size 1
>>> ==6097==    at 0x503FD83: find_identical_suffix (diff_file.c:586)
>>> ==6097==    by 0x5040C45: datasources_open (diff_file.c:815)
>>> ==6097==    by 0x503D6B2: svn_diff_diff3_2 (diff3.c:276)
>>> ==6097==    by 0x5041D3A: svn_diff_file_diff3_2 (diff_file.c:1327)
>>> ==6097==    by 0x401F2F: three_way_merge (diff-diff3-test.c:191)
>>> ==6097==    by 0x4027B7: two_way_diff (diff-diff3-test.c:311)
>>> ==6097==    by 0x405DF8: test_token_compare (diff-diff3-test.c:2589)
>>> ==6097==    by 0x4E34C6A: do_test_num (svn_test_main.c:268)
>>> ==6097==    by 0x4E35686: main (svn_test_main.c:551)
>>> ==6097==  Address 0x138585af is 1 bytes before a block of size 131,072 
>>> alloc'd
>>> ==6097==    at 0x4C28BED: malloc (vg_replace_malloc.c:263)
>>> ==6097==    by 0x572DDDB: pool_alloc (apr_pools.c:1463)
>>> ==6097==    by 0x572DF57: apr_palloc_debug (apr_pools.c:1504)
>>> ==6097==    by 0x503FACA: find_identical_suffix (diff_file.c:558)
>>> ==6097==    by 0x5040C45: datasources_open (diff_file.c:815)
>>> ==6097==    by 0x503D6B2: svn_diff_diff3_2 (diff3.c:276)
>>> ==6097==    by 0x5041D3A: svn_diff_file_diff3_2 (diff_file.c:1327)
>>> ==6097==    by 0x401F2F: three_way_merge (diff-diff3-test.c:191)
>>> ==6097==    by 0x4027B7: two_way_diff (diff-diff3-test.c:311)
>>> ==6097==    by 0x405DF8: test_token_compare (diff-diff3-test.c:2589)
>>> ==6097==    by 0x4E34C6A: do_test_num (svn_test_main.c:268)
>>> ==6097==    by 0x4E35686: main (svn_test_main.c:551)
>>
>> On entry to find_identical_suffix file[0] is
>>
>> (gdb) p file[0]
>> $1 = {path = 0x407ee6 "token-compare-original1", file = 0x127cea0,
>>   size = 131072, chunk = 0, buffer = 0x127cf50 '\n' <repeats 200 times>...,
>>   curp = 0x129cf48 "    @@@\n\300\312&\001", endp = 0x129cf50 
>> "\300\312&\001",
>>   normalize_state = svn_diff__normalize_state_normal, suffix_start_chunk = 
>> -1,
>>   suffix_offset_in_chunk = 0}
>>
>> The file data ends at the end of the chunk.  The code does
>>
>>       if (file_for_suffix[i].chunk == file[i].chunk)
>>         {
>>           /* Prefix ended in last chunk, so we can reuse the prefix buffer */
>>           file_for_suffix[i].buffer = file[i].buffer;
>>         }
>>       else
>>         {
>>           /* There is at least more than 1 chunk,
>>              so allocate full chunk size buffer */
>>           file_for_suffix[i].buffer = apr_palloc(pool, CHUNK_SIZE);
>>           SVN_ERR(read_chunk(file_for_suffix[i].file, 
>> file_for_suffix[i].path,
>>                              file_for_suffix[i].buffer, length[i],
>>                              chunk_to_offset(file_for_suffix[i].chunk),
>>                              pool));
>>         }
>>       file_for_suffix[i].endp = file_for_suffix[i].buffer + length[i];
>>       file_for_suffix[i].curp = file_for_suffix[i].endp - 1;
>>
>> and allocates a new chunk so that file_for_suffix[0] is
>>
>> (gdb) p file_for_suffix[0]
>> $3 = {path = 0x407ee6 "token-compare-original1", file = 0x127cea0,
>>   size = 131072, chunk = 1, buffer = 0x12bd0d0 'A' <repeats 200 times>...,
>>   curp = 0x12bd0cf "", endp = 0x12bd0d0 'A' <repeats 200 times>...,
>>   normalize_state = svn_diff__normalize_state_normal, suffix_start_chunk = 0,
>>   suffix_offset_in_chunk = 0}
>>
>> file_for_suffix[0=.endp is 1 byte before file_for_suffix[0].buffer and
>> so dereferencing it is not valid.
>
> You mean file_for_suffix[0].curp, which is 1 byte before
> file_for_suffix[0].buffer. Bummer, it's not supposed to do that.
>
>> How are the chunks supposed to work?  When the file data ends at the end
>> of a chunk is there supposed to be another valid chunk with no data?
>
> It's been a while, so I'm not sure. I *think* the intention is that
> find_identical_suffix is not executed in this case, because in
> datasources_open, find_identical_suffix is only executed if
> (!reached_one_eof). And find_identical_prefix should have set
> reached_one_eof in this case. I think ... But apparently it doesn't
> ...


Hmmm, I should have looked closer. The case of file.size==131072
(which is CHUNK_SIZE, which is 1<<CHUNK_SHIFT, which is 17), or any
chunk_size multiple, can happen with or without find_identical_prefix
eating it all. The fact is that offset_to_chunk is defined as:

#define offset_to_chunk(offset) ((offset) >> CHUNK_SHIFT)

which means that offset_to_chunk(131072) is 1. And it just so happens
that in multiple places in diff_file.c the "last chunk" is determined
as offset_to_chunk(file.size).

I think this ("last chunk being offset_to_chunk(file.size)") predates
my involvement of working on the prefix/suffix scanning. So I think
it's safe to say that: yes, it seems that there is supposed to be
another chunk with no data. Perhaps someone else with longer
involvement with the origins of the code of diff_file.c can confirm
that this is indeed the intention.

If that's indeed the case (there can be a last chunk with no data), I
suppose it's best to fix find_identical_suffix to check for that.

Sorry to not be any more help ... a bit overworked lately.
-- 
Johan

Re: Diff suffix scanning invalid read

Reply via email to