Thank you Ken, for your valuable tips,

On Jun 17, 2011, at 10:40 AM, Ken Thomases wrote:

> On Jun 17, 2011, at 2:46 AM, Andreas Grosam wrote:
> 
>> Given an NSString as input source, what is the fastest method to "feed" the 
>> parser?
>> 
>> Also worth mentioning is the possible fact about hidden autoreleased memory 
>> objects, for instance when retrieving c-strings from the NSString object or 
>> when converting the NSString's internal encoding to some specified external 
>> form.
> 
> First, you're probably prematurely optimizing.  The chance that accessing the 
> string contents will be a significant portion of the time taken by parsing is 
> small.

If possible, I would prefer to avoid any conversions performed by NSString as a 
result of accessing the content in any way. The parser is capable to parse any 
Unicode encoding form, so if possible, I just would take the NSString's content 
"as is" - if it is encoded in a Unicode form, and - of course - if I am able to 
figure out what actual encoding this is.

Given the parser's speed, any encoding conversion made by NSString is a 
significant performance penalty.

A priory I cannot determine what UTF encoding form the original source is 
encoded to, but in almost all cases it is UTF-8.


> 
> That said, I guess you should try CFStringGetCharactersPtr() and 
> CFStringGetCStringPtr() first.  If either returns non-NULL, then that's about 
> as fast as you're going to get.  
Aha! I read the docs and that's probably what I need. 

If I understood the functions correctly, CFStringGetCharactersPtr() returns 
non-NULL if the internal representation is UTF-16 (machine endianness, I 
guess). If CFStringGetCStringPtr(theString, kCFStringEncodingUTF8) returns 
non-NULL the internal representation equals UTF-8.

Now, there is the possibility that the original source, where the NString has 
been initialized, is provided as UTF-32, but this is rather unlikely. In that 
case, I guess, NSString will perform a conversion internally to either UTF-16 
or else. Possibly in this case, I need to convert to some UTF encoding form 
before feeding the parser.


> After that, maybe use CFStringInitInlineBuffer() and 
> CFStringGetCharacterFromInlineBuffer(), although that doesn't fit with your 
> iterator interface.  You could wrap an iterator around them easily, though.
Yes, I can wrap any kind of iterators around the pointers. The parser just 
requires the semantics of an Input Iterator. In fact, internally the parser 
applies an iterator adapter anyway to support byte-swapping, if needed.

(The iterator interface exists due to support streams. But I haven't thought 
about using NSStreams so far - hopefully this will work as well).

I don't understand the purpose of the "inline buffer" facility fully, though. 
Is accessing a NSString's content through an inline buffer faster than 
accessing the content of a raw pointer whose content has a known encoding?

>  After that, it probably doesn't matter much.  You'll be doing some 
> combination of allocation and encoding conversion no matter what.  So, go 
> with the most convenient method, which will probably be back in Objective-C.
> 

OK, thank you very much for the tips!

Andreas


> Regards,
> Ken
> 

_______________________________________________

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Reply via email to