[Daniel] > You could try HTMLTidy (http://www.egenix.com/files/python/mxTidy.html) > as a first step to get well formed HTML.
But Tidy fails on huge numbers of real-world HTML pages. Simple things like misspelled tags make it fail: >>> from mx.Tidy import tidy >>> results = tidy("<html><body><pree>Hello world!</pre></body></html>") >>> print results[3] line 1 column 7 - Warning: inserting missing 'title' element line 1 column 13 - Error: <pree> is not recognized! line 1 column 13 - Warning: discarding unexpected <pree> line 1 column 31 - Warning: discarding unexpected </pre> This document has errors that must be fixed before using HTML Tidy to generate a tidied up version. Is there a Python HTML tidier which will do as good a job as a browser? -- Richie -- http://mail.python.org/mailman/listinfo/python-list