Charles Hartman <[EMAIL PROTECTED]> writes: > I'm working on text-handling programs that want plain-text files as > input. It's fine to tell users to feed the programs with plain-text > only, but not all users know what this means, even after you explain > it, or they forget. So it would be nice to be able to handle > gracefully the stuff that MS Word (or any word-processor) puts into a > file. Inserting a 0-127 filter is easy but not very > friendly. Typically, the w.p. file loads OK (into a wx.StyledTextCtrl > a.k.a Scintilla editing pane), and mostly be readable. Just a few > characters will be wrong: "smart" quotation marks and the like. > > Is there some well-known way to filter or translate this w.p. garbage? > I don't know whether encodings are relevant;
Bingo. You need to figure out the encoding before you can do intelligent translation of the non-ASCII characters in the text. > I don't know what encoding an MSW file uses. Different WPs will use different encodings. Especially when you start working in a cross-platform environment. I don't know that there is a good solution to this problem. It certainly hasn't been sovled on the web. <mike -- Mike Meyer <[EMAIL PROTECTED]> http://www.mired.org/home/mwm/ Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information. -- http://mail.python.org/mailman/listinfo/python-list