jd wrote: > I am looking for python code (working or sample code) that can take an > html document created by Microsoft Word and clean it up (if you've > never had to look at a Word-generated html document, consider yourself > lucky ;-) Alternatively, if you know of a non-python solution, I'd > like to hear about it. > > Thanks... > > -- jeff > There is a Microsoft add-on for Word which helps to reduce the mess called 'HTML filter'. Go for it here:
http://www.microsoft.com/downloads/details.aspx?FamilyID=209ADBEE-3FBD-482C-83B0-96FB79B74DED&displaylang=EN run it and then use afterwards the other in this thread suggested 'cleaning' methods. Claudio -- http://mail.python.org/mailman/listinfo/python-list