On Thu, Jan 28, 2010 at 6:16 PM, Keith Blount <keithblo...@yahoo.com> wrote:
> I am using the NSXML classes to generate and parse my own XML files. 
> Sometimes these files store strings of text that has been brought in from 
> other applications (for instance, there might be a plain text representation 
> of some text the user has pasted in from Word).

For what it's worth, another common cause of problems with stuff
pasted from Word (at least on the web), is Word docs that contain
characters from the Windows-1252 character set that are invalid UTF-8
byte sequences. Most commonly, 0x80-0x9F, which is the range where
Windows-1252 differs from ISO-Latin-1.

So whatever solution you come up with to deal with the characters
0x00-0x1F that XML specifically doesn't allow, you probably want to
also account for ranges like 0x80-0xFF that aren't valid UTF-8 at all.

http://en.wikipedia.org/wiki/UTF-8#Invalid_byte_sequences
http://en.wikipedia.org/wiki/Windows-1252

Sixten
_______________________________________________

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Reply via email to