On Sun, May 31, 2009 at 7:08 PM, Ken Tozier <kentoz...@comcast.net> wrote: > Hi > > I wrote an app that converts Word files into a simpler format by first > converting from .doc to html using scripting and Word's "Save as Web page" > command followed by using NSXMLDocument to extract the parts I need. I'm > finding that there are no good options when it comes to choosing a character > encoding for the saved html (this is set in Word) because it uses some > custom tags to embed special characters like bullets and that UTF-8 chokes > on. > > My basic process is to > - Use Applescript to tell Word to convert from .doc to html and save as > utf-8 > - Read the resultant file into an NSString with NSUTF8StringEncoding > > I've tried saving the html from Word as NSLatin1Encoding but many important > characters like double-quotes, apostrophes, dashes etc are translated to cap > "O's" with various diacritical marks. > > Not really sure how to proceed as there doesn't seem to be a single encoding > useable by NSString that will both translate the quotes and allow me to > access Word's "special" characters. Anyone have any ideas how I can read the > html and treat it as a mostly normal character string without resorting to a > custom binary character translation class?
UTF-8 shouldn't choke on anything. It is a universal character encoding. It's vaguely possible that Word uses some custom characters that aren't even in Unicode, but if it does, those characters won't be in any *other* encoding either, so they wouldn't work regardless. Can you elaborate on just what choosing UTF-8 produces and how it fails? In any case, this is probably more of a Word question than a Cocoa question, and I imagine you'd get better answers somewhere where people are knowledgeable about Word. Mike _______________________________________________ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com