TL;DR: I'm debugging this issue, and finding multiple problems. This is a summary of where I have got to so far.

http://subversion.apache.org/issue/4840

I am investigating an assertion failure. I received a stack trace without parameter values, from which I quote the relevant part:

#4  svn_sort__array_insert (array=?, new_element=?, insert_index=?)
    at subversion/libsvn_subr/sorts.c:310
    assert(0 <= insert_index && insert_index <= array->nelts);

#5  svn_rangelist_merge2 (rangelist=?, chg=?,)
    at subversion/libsvn_subr/mergeinfo.c:1231
    svn_sort__array_insert(rangelist, &range_copy, i++);

#6  svn_mergeinfo_merge2 (mergeinfo=?, changes=?,)
    at subversion/libsvn_subr/mergeinfo.c:1838
#7  combine_forked_mergeinfo_props (...)
    at subversion/libsvn_wc/props.c:138
#8  apply_single_mergeinfo_prop_change (...)
    at subversion/libsvn_wc/props.c:1087

This was reported in Subversion 1.10.4. I have reproduced with 1.10.6 and trunk@1871000.

There seem to be multiple problems contributing to this.


* Ill-defined canonical form for a rangelist.

We have (the same in 1.10 and trunk):

  - svn_rangelist__is_canonical()
"all ranges in RANGELIST are in ascending order and do not overlap and are not adjacent" and ranges are non-empty.

  - svn_rangelist__canonicalize()
"sort the ranges, and combine adjacent or overlapping ranges into single ranges. If overlapping ranges have different inheritability, return an error."

- The inputs to svn_rangelist_merge2() "must be sorted according to svn_sort_compare_ranges" but need not be "compacted"; the exact specification is unclear.

A rangelist returned by 'canonicalize' is not necessarily 'canonical' as defined by svn_rangelist__is_canonical(). (We have a similar situation with other functional areas in Subversion such as path canonicalization.)

By code inspection, it looks like if any of the individual input ranges to svn_rangelist__canonicalize() are non-canonical (reversed or empty) then it gives undefined results (an error return, a plausible result or a totally bogus result, depending on the inputs).

By random-input testing svn_rangelist_merge2() with general non-canonical inputs, I can reproduce this particular case and some other failure modes (in 1.10 and trunk).

By random-input testing svn_rangelist_merge2() with non-canonical inputs that obey the doc string's "sorted" requirement, I can reproduce one failure mode: non-canonical output from canonical inputs (in 1.10 and trunk).

To reproduce the particular reported failure mode, I had to compile with some assertions disabled as they would be in a release build.

The function svn_rangelist_merge2() ends with a debug-mode-only assertion that output is canonical, which strongly implies it does not really need to support non-canonical inputs (that obey a lesser "sorted" criterion).

Conclusions so far:
- The input requirements for svn_rangelist_merge2() (and other functions) should be simple and clear, and enforced in code. - We need test coverage for svn_rangelist_merge2() over the full range of inputs that meet those requirements. - Non-canonical output from canonical inputs is an error that needs to be fixed. - Apparently, svn_rangelist_merge2() only needs to support canonical inputs, so should be defined and enforced as such. - As the reported failure mode only occurs with inputs violating the requirements, presumably the fault is in the caller (svn_mergeinfo_merge2) or its callers, so I will have to pursue my search there.


* Rangelist "merge" is a set-union operation.

Both the naming and the implementation of svn_rangelist_merge2() are obscure. The desired functionality is a set-union operation, or rather two set-unions:

let set RRB be the set of revisions that apply to a base path, which includes both the 'non-inheritable' and 'inheritable' ranges; and let set RRC be the set of revisions that are inheritable to child paths, which means only the 'inheritable' ranges; then

    rangelist_merge(input1, input2):
      output.RRB = union(input1.RRB, input2.RRB)
      output.RRC = union(input1.RRC, input2.RRC)

assuming no "reversed" ranges are to be supported.

The current implementation is 275 lines long and contains 22 "if" statements. No wonder it is broken.

An implementation based on set unions would surely be simple and reliable.


* The function range_to_string(), when given a non-canonical representation of an empty range such as .start == .end == 10, yields the totally wrong output "10-11". This can be very confusing when seen in error messages and debugging. I propose to fix this to issue a clear debugging output, like "[empty-range@10]", in this case.


* Poor and inconsistent error handling in svn_sort__array_insert and _delete.

svn_sort__array_insert() aborts on out-of-range inputs. svn_sort__array_delete() silently ignores out-of-range inputs, doing nothing.

I propose to change them both to report a (catchable) error on out-of-range inputs.

(Error checking is generally in keeping with our overall design principles. Silently allowing certain kinds of bad input is an alternative and sometimes appropriate design option for library functions, in cases where the expected behaviour can be specified clearly and simply. For example, we could specify that 'delete' shall delete array elements with indexes in the intersection of the existing indexes and the specified indexes, and ignore any other specified indexes. However that's not what _delete is currently doing: it is ignoring the whole request if any part of it is out of range.)


* Our testing is clearly inadequate.

It shows me that we need to be using random-input testing on functions like this (functions in the category of mergeinfo arithmetic).

I propose to add some, starting with svn_rangelist_merge2().


=== Next steps ===

I will send patches demonstrating the bugs soon, not right now.

(Right now I have everything mixed up in a single WC for debugging.)

Then I will send patches for some of the other issues mentioned.

Then I will investigate further up the call stack in a similar way.


- Julian

Reply via email to