On May 13, 2008, at 17:00, Jens Alfke wrote:


On 12 May '08, at 11:38 PM, Daniel Vollmer wrote:

I'm parsing a rather large text-file (usually >20MB) and in doing so I'm iterating over its lines with [String getParagraphStart::::]. I've found a rather noticeable speed-up in the parsing operation if I create the string in question from an NSData object (created via initWithContentsOfMappedFile) using [String initWithData:encoding:].

It sounds like you're creating a single NSString containing the entire contents of the file, then?

Yes. Is that something I shouldn't do? I mean, I feel a tiny bit silly creating such huge strings but I didn't find a nice alternative (e.g. like the Ruby for each line iterators on file objects).

2) Are substrings created from the original string (e.g. substringWithRange etc.) still backed properly after the original string and the NSData object are released?

Yes. Even if the NSString is still using the NSData's contents for its buffer, it retained them, so releasing the NSData won't make it go away until the string is done with it.

But now that means that the strings are "endangered" from in-place file modification for the lifetime of my objects created during parsing, not just the initial parsing itself, correct? Also, it feels a bit silly to have a retain on the 20MB NSData object while I still hold references to about 5KB of string bytes from various places in the file. Usually all this "behind-the-scenes" storage retaining doesn't matter much, but I'd quite like to make sure I drop most of the 20MB once I'm done parsing. This question of course also applies if I'm not mapping the file and creating a String from it directly


FWIW, my current iteration looks like this (String being the big 20MB one);

NSUInteger length = [String length];
NSUInteger paraStart = 0, paraEnd = 0, contentsEnd = 0;

while (paraEnd < length)
{
[String getParagraphStart:&paraStart end:&paraEnd contentsEnd:&contentsEnd forRange:NSMakeRange(paraEnd, 0)]; line = [String substringWithRange:NSMakeRange(paraStart, contentsEnd - paraStart)];
        // do lots of menial parsing of line
}

If I leave the mmaped reading in, it sounds like a sensible idea to check whether the file is on the same drive as the app. So thanks for that suggestion.


Thanks for any further insight,
        Daniel.
_______________________________________________

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to [EMAIL PROTECTED]

Reply via email to