I'm working on text-handling programs that want plain-text files as input. It's fine to tell users to feed the programs with plain-text only, but not all users know what this means, even after you explain it, or they forget. So it would be nice to be able to handle gracefully the stuff that MS Word (or any word-processor) puts into a file. Inserting a 0-127 filter is easy but not very friendly. Typically, the w.p. file loads OK (into a wx.StyledTextCtrl a.k.a Scintilla editing pane), and mostly be readable. Just a few characters will be wrong: "smart" quotation marks and the like.

Is there some well-known way to filter or translate this w.p. garbage? I don't know whether encodings are relevant; I don't know what encoding an MSW file uses. I don't see how to use s.translate() because I don't know how to predict what the incoming format will be.

Any hints welcome.

Charles Hartman

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to