IIUC, the original poster is asking about 'cleaning up' in the sense of removing the swathes of unnecessary and/or redundant 'cruft' that Word puts in there, rather than making valid HTML out of invalid HTML. Again, IIUC, HTMLtidy does not do this.
If Beautiful Soup does, then I'm intererested! jon N -- http://mail.python.org/mailman/listinfo/python-list