Thank you. I think I'm almost there, though I'm getting incorrect values. I tried both %c and %C. These yield, respectively, î and  , which are incorrect.

2009-08-11 13:24:58.879 ParseTest[566:10b] The num is U+4E01
2009-08-11 13:24:58.889 ParseTest[566:10b] codeItself is 4E01
2009-08-11 13:24:58.891 ParseTest[566:10b] charAsString is 19969
2009-08-11 13:24:58.892 ParseTest[566:10b] strc is î
2009-08-11 13:24:58.893 ParseTest[566:10b] strC is 

UnicodeRecordParsingStrat *urps = [[UnicodeRecordParsingStrat alloc] init];
theUniWord = [urps parseUnicodeWord: unicodeLine]; // yields @"U+4E01"
codeItself = [urps theCharacterFromCode: theUniWord]; // yields @"4E01"
NSString *ox = @"0x";
NSString *hexString;
hexString = [ox stringByAppendingString: codeItself]; // yields @"0x4E01"
NSScanner *scanner = [NSScanner scannerWithString: hexString];
NSString *charAsString;
unsigned  value;
if  ([scanner scanHexInt:&value]){
charAsString = [NSString stringWithFormat: @"%u", value]; // yields 19969
        NSLog(@"charAsString is %...@\n", charAsString);
        NSString *strc = [NSString stringWithFormat: @"%c", &value];
        NSLog(@"strc is %...@\n", strc);
        NSString *strC = [NSString stringWithFormat: @"%C", &value];
        NSLog(@"strC is %...@\n", strC);
} else {
        NSLog( @"Hex reading failed." );
}

Seems like a lot of code for a simple conversion.....

On Aug 11, 2009, at 11:12 AM, Alastair Houghton wrote:

On 11 Aug 2009, at 15:40, Daniel Child wrote:

Unihan.txt provides text files showing characters in the format U +XXXX. If I scan these in, naturally I can obtain the NSString representation XXXX. But I need to convert this text to genuine unichars OR NSStrings (the actual characters represented).

Two questions:

1. I didn't see any relevant conversion methods under NSString or NSNumber. Are there Cocoa functions to perform this easily?

Use NSScanner's -scanHexInt (or similar) to scan the hexadecimal part, then stick that in a unichar and either use - stringWithFormat's %C format code, or NSString's - initWithCharacters:length:/+stringWithCharacters:length: methods.

Or you can just scan the hex part yourself manually; it isn't hard.

You could even turn it into UTF-8 and use strtoul() if you were feeling mildly masochistic.

2. I am assuming I have to convert to the format '0xXXXX', but is it also possible to work with U+XXXX directly in cocoa? I got error messages for all of the following formats:

 unichar uch = 0x0041;
 NSString *str = [NSString stringWithCharacters:&uch length:1];
 NSLog (@"str is \"%...@\".", str);

 /* Output:

    str is "A" */

Kind regards,

Alastair.

--
http://alastairs-place.net




_______________________________________________

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Reply via email to