Greg,

Thanks for the INSTANT answer! I added the auto-release pool inside the read 
loop and ran the program on the largest data file I have, 3.46 Gb. The program 
ran perfectly in just under nine minutes and never built up any virtual memory.

In hindsight I am embarrassed I did not come to the answer myself, as I have a 
fairly good understanding of all the supported memory management models. ARC 
tends to make one stop worrying, which tends to make one stop thinking. No 
excuses, though. I was too dim to see it.

Thanks again. You nailed it for me.

Tom Wetmore

On Jul 26, 2012, at 11:29 PM, Greg Parker wrote:

> On Jul 26, 2012, at 8:20 PM, Thomas Wetmore <t...@verizon.net> wrote:
>> I need to process very large files, and below is the program I use to do the 
>> work. I have run this program on data files from very small up to over 3 Gb 
>> in length. Much of my testing has been done with files in the 200 to 300 Mb 
>> size, and the program works fine at that size.
>> 
>> However, when I move up files in the 2 to 4 Gb range, behavior changes. The 
>> program starts consuming great amounts of virtual memory, around 14 Gb, 
>> takes more than a half hour to run, and after the functional part of the 
>> program is over, it takes another half hour for the program to give back 
>> much of the virtual memory, and once the program does fully quit, it takes 
>> the operating system another 10 minutes or so of thrashing before the final 
>> amount of virtual memory is returned and the hard drive finally calms down.
>> 
>> I've never processed such massive files before, but I am surprised by the 
>> behavior. As you will see I'm using memory mapped NSData, and once I start 
>> processing the data I simply proceed through the data from beginning to end, 
>> separating the data into newline-separated lines and processing the lines. 
>> That processing is simple, just breaking each line into vertical-bar 
>> separated fields, and putting some of those field values into dictionaries.
>> 
>> If I am simply reading through memory mapped data like this, why does the 
>> program use about six times as much virtual memory as the amount of memory 
>> needed by the file itself; why does the virtual memory accumulate in the 
>> first place, since I never return to memory pages I have already read 
>> through, and why does it take three quarters of an hour for the system to 
>> calm down once again after the processing has finished.
> 
> You should use the Allocations instrument to see what is hogging your memory. 
> 
> My guess is that the memory-mapped NSData is fine, but that your NSString and 
> other code inside processLine() is allocating objects and not freeing them.
> 
> One simple possibility is that you are creating lots autoreleased objects, 
> but not cleaning up any autorelease pools so they don't get deallocated until 
> you are all done. Try this:
> 
>       while (YES) {
>         @autoreleasepool {
>           if (start >= length) break;
>           while (end < length && bytes[end] != '\n') {
>               end++;
>           }
>           line = [[NSString alloc] initWithBytes: bytes + start length: end - 
> start encoding: 4];
>           processLine(line);
>           start = end + 1;
>           end = start;
>         }
>       }
> 
> (Also, if you are not using ARC then that NSString is leaking, which will 
> also cost lots of memory.)
> 
> 
> -- 
> Greg Parker     gpar...@apple.com     Runtime Wrangler
> 
> 


_______________________________________________

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Reply via email to