On Jan 5, 9:46 am, "Gabriel Genellina" <gagsl-...@yahoo.com.ar> wrote: > En Mon, 04 Jan 2010 19:04:12 -0300, Richard <richar...@gmail.com> escribió: > > > I have been using the difflib library to find where 2 large HTML > > documents differ. The Differ().compare() method does this, but it is > > very slow - atleast 100x slower than the unix diff command. > > Differ compares sequences of lines *and* lines as sequences of characters > to provide intra-line differences. The diff command only processes lines. > If you aren't interested in intra-line differences, use a SequenceMatcher > instead. Or, invoke the diff command using subprocess.Popen + > communicate. > > -- > Gabriel Genellina
thank you very much Gabriel! Passing a list of the document lines makes the efficiency comparable to the diff command. Richard -- http://mail.python.org/mailman/listinfo/python-list