Charles Hartman wrote: > I'm working on text-handling programs that want plain-text files as > input. It's fine to tell users to feed the programs with plain-text > only, but not all users know what this means, even after you explain it, > or they forget. So it would be nice to be able to handle gracefully the > stuff that MS Word (or any word-processor) puts into a file. Inserting a > 0-127 filter is easy but not very friendly. Typically, the w.p. file > loads OK (into a wx.StyledTextCtrl a.k.a Scintilla editing pane), and > mostly be readable. Just a few characters will be wrong: "smart" > quotation marks and the like. > > Is there some well-known way to filter or translate this w.p. garbage? I > don't know whether encodings are relevant; I don't know what encoding an > MSW file uses. I don't see how to use s.translate() because I don't know > how to predict what the incoming format will be. > > Any hints welcome.
Antiword? See http://www.winfield.demon.nl/ OpenOffice driven via PyUNO interface? Tim C -- http://mail.python.org/mailman/listinfo/python-list