On 19 dic, 11:53, Neilen Marais <[EMAIL PROTECTED]> wrote: > Hi > > I'm trying to compare some text to find differences other than whitespace. > I seem to be misunderstanding something, since I can't even get a basic > example to work: > > In [104]: d =difflib.Differ(charjunk=difflib.IS_CHARACTER_JUNK) > > In [105]: list(d.compare([' a'], ['a'])) > Out[105]: ['- a', '+ a'] > > Surely if whitespace characters are being ignored those two strings should > be marked as identical? What am I doing wrong?
The docs for Differ are a bit terse and misleading. compare() does a two-level matching: first, on a *line* level, considering only the linejunk parameter. And then, for each pair of similar lines found on the first stage, it does a intraline match considering only the charjunk parameter. Also note that junk!=ignored, the algorithm tries to "find the longest contiguous matching subsequence that contains no ``junk'' elements" Using a slightly longer text gets closer to what you want, I think: d=difflib.Differ(charjunk=difflib.IS_CHARACTER_JUNK) for delta in d.compare([' a larger line'],['a longer line']): print delta - a larger line ? --- ^^ + a longer line ? ^^ -- Gabriel Genellina -- http://mail.python.org/mailman/listinfo/python-list