Tidy can now perform wonders on HTML saved from Microsoft Word 2000!
Word bulks out HTML files with stuff for round-tripping presentation
between HTML and Word. If you are more concerned about using HTML on the
Web, check out Tidy's "Word-2000"
<http://www.w3.org/People/Raggett/tidy/#word2000> config option! Of
course Tidy does a good job on Word'97 files as well!
-- source: http://www.w3.org/People/Raggett/tidy/
jkn wrote:
IIUC, the original poster is asking about 'cleaning up' in the sense
of removing the swathes of unnecessary and/or redundant 'cruft' that
Word puts in there, rather than making valid HTML out of invalid HTML.
Again, IIUC, HTMLtidy does not do this.
If Beautiful Soup does, then I'm intererested!
jon N
--
Shane Geiger
IT Director
National Council on Economic Education
[EMAIL PROTECTED] | 402-438-8958 | http://www.ncee.net
Leading the Campaign for Economic and Financial Literacy
begin:vcard
fn:Shane Geiger
n:Geiger;Shane
org:National Council on Economic Education (NCEE)
adr:Suite 215;;201 N. 8th Street;Lincoln;NE;68508;United States
email;internet:[EMAIL PROTECTED]
title:IT Director
tel;work:402-438-8958
x-mozilla-html:FALSE
url:http://www.ncee.net
version:2.1
end:vcard
--
http://mail.python.org/mailman/listinfo/python-list