Re: How to convert UInt8 array to NSString

Jean-Daniel Dupas Wed, 07 May 2008 12:37:06 -0700

What make you think this function assumes an exact encoding ? Thismethod is not the same than +[NSStringstringWithContentsOfFile:encoding:error:].

The method +stringWithContentsOfFile:usedEncoding:error: returns thesniffed encoding by reference using the second argument. At leastthat's what the documentation says: “ This method attempts todetermine the encoding of the file at path.”This method was introduced in Tiger, that's maybe why you never see itbefore.



Le 7 mai 08 à 21:27, Gary L. Wade a écrit :

No, that's not the same thing. The method you suggest assumes anexact encoding; the sniffer functions from TextEncodingConverterlook at the data to see if it follows the patterns appropriate for asuggested set of encodings and lets you know which one would be thebest match. Typically, such sniffers are best for differentiatingDBCS-based characters where there's a sequence like you'd find inShift-JIS and the like. Let me know when you find the "Cocoa" wayto do this.

More modern and more Cocoa way? You mean something like this  +
[NSString stringWithContentsOfFile:usedEncoding:error:] ;-)

«Discussion
 This method attempts to determine the encoding of the file at path.»

Le 7 mai 08 à 19:33, Gary L. Wade a écrit :

If you're interested in determining the best encoding match for
text, look at the TextEncodingConverter.h header, which has
functions related to encoding sniffing.  There may be more modern
techniques available, but I had used that almost a decade ago in a
formerly major web browser.  It's not perfect, of course, but it
might be the best solution for your problem.


On May 6, 2008, at 9:22 PM, Jens Alfke wrote:


On 6 May '08, at 10:45 AM, Aki Inoue wrote:

Actually, I don't recommend using CP1252 as the generic fallback
encoding like this.

The encoding does have gaps, and the handling of those invalidgaps

varies between conversion engines.  CF/NSString treat the invalid
bytes strictly and return nil encountering those.


I wasn't aware it had gaps — I've never run into them. Where are
they?


<http://en.wikipedia.org/wiki/Windows-1252>

5 characters in the 0x80..0x9F range.

So, our recommendation now is to try UTF-8 first; then, try some
other encoding deduced from the context (user's localization,
intended source/destination of the data, etc).  If all failed,
should try MacRoman as the ultimate fallback (the encoding has no
gap so never fails).

In the contexts I've been dealing with — data fetched over HTTPfrom

random websites — there hasn't been anything deducible from the
context (assuming the HTTP Content-Type already failed.) In that
situation MacRoman is not at all a good fallback as almost no Web
content uses it; CP-1252 or ISO-Latin-1 are the most likely
fallbacks after UTF-8.



I will agree with this if it's web content you're dealing with.
Although, just do a fallback to windows1252.  Lots of site content
was

authored with that encoding and mistakenly marked as ISO_8859-1.But

that's a topic for another forum.

_______________________________________________

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/devlists%40shadowlab.org

This email sent to [EMAIL PROTECTED]


_______________________________________________

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to [EMAIL PROTECTED]

Re: How to convert UInt8 array to NSString

Reply via email to