On Sun, May 31, 2009 at 7:08 PM, Ken Tozier <kentoz...@comcast.net> wrote:
> Hi
>
> I wrote an app that converts Word files into a simpler format by first
> converting from .doc to html using scripting and Word's "Save as Web page"
> command followed by using NSXMLDocument to extract the parts I need. I'm
> finding that there are no good options when it comes to choosing a character
> encoding for the saved html (this is set in Word) because it uses some
> custom tags to embed special characters like bullets and that UTF-8 chokes
> on.
>
> My basic process is to
> - Use Applescript to tell Word to convert from .doc to html and save as
> utf-8
> - Read the resultant file into an NSString with NSUTF8StringEncoding
>
> I've tried saving the html from Word as NSLatin1Encoding but many important
> characters like double-quotes, apostrophes, dashes etc are translated to cap
> "O's" with various diacritical marks.
>
> Not really sure how to proceed as there doesn't seem to be a single encoding
> useable by NSString that will both translate the quotes and allow me to
> access Word's "special" characters. Anyone have any ideas how I can read the
> html and treat it as a mostly normal character string without resorting to a
> custom binary  character translation class?

UTF-8 shouldn't choke on anything. It is a universal character
encoding. It's vaguely possible that Word uses some custom characters
that aren't even in Unicode, but if it does, those characters won't be
in any *other* encoding either, so they wouldn't work regardless.

Can you elaborate on just what choosing UTF-8 produces and how it fails?

In any case, this is probably more of a Word question than a Cocoa
question, and I imagine you'd get better answers somewhere where
people are knowledgeable about Word.

Mike
_______________________________________________

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Reply via email to