Thank you Ken, for your valuable tips, On Jun 17, 2011, at 10:40 AM, Ken Thomases wrote:
> On Jun 17, 2011, at 2:46 AM, Andreas Grosam wrote: > >> Given an NSString as input source, what is the fastest method to "feed" the >> parser? >> >> Also worth mentioning is the possible fact about hidden autoreleased memory >> objects, for instance when retrieving c-strings from the NSString object or >> when converting the NSString's internal encoding to some specified external >> form. > > First, you're probably prematurely optimizing. The chance that accessing the > string contents will be a significant portion of the time taken by parsing is > small. If possible, I would prefer to avoid any conversions performed by NSString as a result of accessing the content in any way. The parser is capable to parse any Unicode encoding form, so if possible, I just would take the NSString's content "as is" - if it is encoded in a Unicode form, and - of course - if I am able to figure out what actual encoding this is. Given the parser's speed, any encoding conversion made by NSString is a significant performance penalty. A priory I cannot determine what UTF encoding form the original source is encoded to, but in almost all cases it is UTF-8. > > That said, I guess you should try CFStringGetCharactersPtr() and > CFStringGetCStringPtr() first. If either returns non-NULL, then that's about > as fast as you're going to get. Aha! I read the docs and that's probably what I need. If I understood the functions correctly, CFStringGetCharactersPtr() returns non-NULL if the internal representation is UTF-16 (machine endianness, I guess). If CFStringGetCStringPtr(theString, kCFStringEncodingUTF8) returns non-NULL the internal representation equals UTF-8. Now, there is the possibility that the original source, where the NString has been initialized, is provided as UTF-32, but this is rather unlikely. In that case, I guess, NSString will perform a conversion internally to either UTF-16 or else. Possibly in this case, I need to convert to some UTF encoding form before feeding the parser. > After that, maybe use CFStringInitInlineBuffer() and > CFStringGetCharacterFromInlineBuffer(), although that doesn't fit with your > iterator interface. You could wrap an iterator around them easily, though. Yes, I can wrap any kind of iterators around the pointers. The parser just requires the semantics of an Input Iterator. In fact, internally the parser applies an iterator adapter anyway to support byte-swapping, if needed. (The iterator interface exists due to support streams. But I haven't thought about using NSStreams so far - hopefully this will work as well). I don't understand the purpose of the "inline buffer" facility fully, though. Is accessing a NSString's content through an inline buffer faster than accessing the content of a raw pointer whose content has a known encoding? > After that, it probably doesn't matter much. You'll be doing some > combination of allocation and encoding conversion no matter what. So, go > with the most convenient method, which will probably be back in Objective-C. > OK, thank you very much for the tips! Andreas > Regards, > Ken > _______________________________________________ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com