Thank you. I think I'm almost there, though I'm getting incorrect
values. I tried both %c and %C. These yield, respectively, î and ,
which are incorrect.
2009-08-11 13:24:58.879 ParseTest[566:10b] The num is U+4E01
2009-08-11 13:24:58.889 ParseTest[566:10b] codeItself is 4E01
2009-08-11 13:24:58.891 ParseTest[566:10b] charAsString is 19969
2009-08-11 13:24:58.892 ParseTest[566:10b] strc is î
2009-08-11 13:24:58.893 ParseTest[566:10b] strC is
UnicodeRecordParsingStrat *urps = [[UnicodeRecordParsingStrat alloc]
init];
theUniWord = [urps parseUnicodeWord: unicodeLine]; // yields @"U+4E01"
codeItself = [urps theCharacterFromCode: theUniWord]; // yields @"4E01"
NSString *ox = @"0x";
NSString *hexString;
hexString = [ox stringByAppendingString: codeItself]; // yields
@"0x4E01"
NSScanner *scanner = [NSScanner scannerWithString: hexString];
NSString *charAsString;
unsigned value;
if ([scanner scanHexInt:&value]){
charAsString = [NSString stringWithFormat: @"%u", value]; // yields
19969
NSLog(@"charAsString is %...@\n", charAsString);
NSString *strc = [NSString stringWithFormat: @"%c", &value];
NSLog(@"strc is %...@\n", strc);
NSString *strC = [NSString stringWithFormat: @"%C", &value];
NSLog(@"strC is %...@\n", strC);
} else {
NSLog( @"Hex reading failed." );
}
Seems like a lot of code for a simple conversion.....
On Aug 11, 2009, at 11:12 AM, Alastair Houghton wrote:
On 11 Aug 2009, at 15:40, Daniel Child wrote:
Unihan.txt provides text files showing characters in the format U
+XXXX.
If I scan these in, naturally I can obtain the NSString
representation XXXX.
But I need to convert this text to genuine unichars OR NSStrings
(the actual characters represented).
Two questions:
1. I didn't see any relevant conversion methods under NSString or
NSNumber. Are there Cocoa functions to perform this easily?
Use NSScanner's -scanHexInt (or similar) to scan the hexadecimal
part, then stick that in a unichar and either use -
stringWithFormat's %C format code, or NSString's -
initWithCharacters:length:/+stringWithCharacters:length: methods.
Or you can just scan the hex part yourself manually; it isn't hard.
You could even turn it into UTF-8 and use strtoul() if you were
feeling mildly masochistic.
2. I am assuming I have to convert to the format '0xXXXX', but is
it also possible to work with U+XXXX directly in cocoa? I got error
messages for all of the following formats:
unichar uch = 0x0041;
NSString *str = [NSString stringWithCharacters:&uch length:1];
NSLog (@"str is \"%...@\".", str);
/* Output:
str is "A" */
Kind regards,
Alastair.
--
http://alastairs-place.net
_______________________________________________
Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com
This email sent to arch...@mail-archive.com