jkn wrote: > IIUC, the original poster is asking about 'cleaning up' in the sense > of removing the swathes of unnecessary and/or redundant 'cruft' that > Word puts in there, rather than making valid HTML out of invalid HTML. > Again, IIUC, HTMLtidy does not do this.
>From that very page I linked to: """ Tidy can now perform wonders on HTML saved from Microsoft Word 2000! Word bulks out HTML files with stuff for round-tripping presentation between HTML and Word. If you are more concerned about using HTML on the Web, check out Tidy's "Word-2000" config option! Of course Tidy does a good job on Word'97 files as well! """ Peter -- http://mail.python.org/mailman/listinfo/python-list