[issue4889] difflib

Gabriel Genellina Mon, 12 Jan 2009 22:38:12 -0800

Gabriel Genellina <gagsl-...@yahoo.com.ar> added the comment:

You (as a human) most likely parse these lines:


hostname vaijain123
hostname CAVANC1001CR1

as "two words, the first one is the same, the second word changed".
But difflib sees them more or less as: "21 letters, 8 of them are the 
same, 13 are different". There are many more differences than matches, 
so it makes sense to show the changes as a complete replacement:

>>> d = difflib.ndiff(["hostname vaijain123\n"], ["hostname 
CAVANC1001CR1\n"])
>>> print ''.join(d)
- hostname vaijain123
+ hostname CAVANC1001CR1

It has nothing to do with upper or lower case letters ("A" and "a" are 
completely different things for difflib). If the names were shorter, it 
might consider a match:

>>> d = difflib.ndiff(["hostname vai\n"], ["hostname CAV\n"])
>>> print ''.join(d)
- hostname vai
?          ^^^
+ hostname CAV
?          ^^^

Note how the ratio changes:

>>> difflib.SequenceMatcher(None, "hostname vaijain123", "hostname 
CAVANC1001CR1").ratio()
0.48780487804878048
>>> difflib.SequenceMatcher(None, "hostname vai", "hostname CAV").ratio
()
0.75

The ratio must be 0.75 or higher for a differ to consider two lines 
"close enough" to show intra-line differences.

----------
nosy: +gagenellina

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue4889>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue4889] difflib

Reply via email to