"Gabriel Genellina" <[EMAIL PROTECTED]> wrote: >En Wed, 10 Dec 2008 15:14:20 -0200, eliben <[EMAIL PROTECTED]> escribió: > >> What ? This can't be. >> >> 1. Go to http://try-python.mired.org/ >> 2. Type >> import difflib >> 3. Type >> difflib.SequenceMatcher(None, [4] + [5] * 200, [5] * 200).ratio() >> >> Don't you get 0 as the answer ? > >Ah, but that isn't the same expression you posted originally: > >SequenceMatcher(None, [4] + [10] * 500 + [5], [10] * 500 + [5]).ratio() > >Using *that* expression I got near 1.0 always. But leaving out the [5] at >the end, it's true, it gives the wrong answer. >... >I've updated the tracker item.
Your assessment that it is the same problem as #1528074 is correct. It's the "popularity" optimization. The key here is that the second sequence consists of more than 200 identical items. For example, all of the following give the same bad result: difflib.SequenceMatcher(None, [4] + [5] * 200, [5] * 200).ratio() difflib.SequenceMatcher(None, [4] + [5] , [5] * 200).ratio() difflib.SequenceMatcher(None, [4] , [5] * 200).ratio() If you print get_matching_blocks(), you'll see that there are none, because the "b" sequence is optimized completely away. The #1528074 calls it "working by designed" and suggests updating the doc. However, I would argue that it's worth checking for this. -- Tim Roberts, [EMAIL PROTECTED] Providenza & Boekelheide, Inc. -- http://mail.python.org/mailman/listinfo/python-list