On Thu, Jan 28, 2010 at 6:16 PM, Keith Blount <keithblo...@yahoo.com> wrote: > I am using the NSXML classes to generate and parse my own XML files. > Sometimes these files store strings of text that has been brought in from > other applications (for instance, there might be a plain text representation > of some text the user has pasted in from Word).
For what it's worth, another common cause of problems with stuff pasted from Word (at least on the web), is Word docs that contain characters from the Windows-1252 character set that are invalid UTF-8 byte sequences. Most commonly, 0x80-0x9F, which is the range where Windows-1252 differs from ISO-Latin-1. So whatever solution you come up with to deal with the characters 0x00-0x1F that XML specifically doesn't allow, you probably want to also account for ranges like 0x80-0xFF that aren't valid UTF-8 at all. http://en.wikipedia.org/wiki/UTF-8#Invalid_byte_sequences http://en.wikipedia.org/wiki/Windows-1252 Sixten _______________________________________________ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com