Tim Peters added the comment:

Comparison can certainly trigger a recursion error if the sequences contain no 
(or few) matching non-junk elements, but contain many "almost matching" 
elements.  If the sequences have lengths M and N, recursion can go as deep as 
2*min(M, N) then.

Now in the test case, we have two lists of integers.  Difflib has no idea what 
"almost match" might mean for integers.  But difflib isn't passed two lists of 
integers.  Instead unittest appears to be converting the input lists to giant 
strings, then splitting the giant strings on whitespace (or just linefeeds?), 
and then feeding the resulting lists of substrings to difflib.  That doesn't 
make much sense to me, but so it goes.

There are no matches in the two lists of strings, so difflib starts looking for 
"close matches", and there are a lot of these.

At first it decides "[1," and "[100," aren't close enough, but " 10," and " 
101," are close enough.  That's used as a synch point, and then there's 
recursion to match the sublists before and after the synch point.  Then " 12," 
and " 102," are close enough, so that pair is used as the next synch point, and 
another layer of 2-sided recursion.  Etc.

Whether someone wants to rip the recursion out of _fancy_replace and 
_fancy_helper is up to them.  I wouldn't bother, if this unittest-created 
problem is the only reported instance.  Comparing strings seems a poor idea 
from the start (there's no guarantee in general, e.g., that A != B implies 
str(A) != str(B) or repr(A) != repr(B)), and difflib isn't good in any case at 
comparing sequences with few matching elements (e.g., remove the recursion and 
it will still take time at best cubic in the common length of the sequences - 
would it really help to change "a failing unittest bombs with RecursionError" 
to "a failing unittest seems to take forever"?).

I'd suggest instead that unittest, say, locate the first pair of non-equal 
elements itself, and display that along with a few elements of context on 
either side.  Or something ;-)  Something worst-case linear-time, and using != 
directly on sequence elements (not on strings derived from the sequence 
elements).

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue21253>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to