Greg, Thanks for the INSTANT answer! I added the auto-release pool inside the read loop and ran the program on the largest data file I have, 3.46 Gb. The program ran perfectly in just under nine minutes and never built up any virtual memory.
In hindsight I am embarrassed I did not come to the answer myself, as I have a fairly good understanding of all the supported memory management models. ARC tends to make one stop worrying, which tends to make one stop thinking. No excuses, though. I was too dim to see it. Thanks again. You nailed it for me. Tom Wetmore On Jul 26, 2012, at 11:29 PM, Greg Parker wrote: > On Jul 26, 2012, at 8:20 PM, Thomas Wetmore <t...@verizon.net> wrote: >> I need to process very large files, and below is the program I use to do the >> work. I have run this program on data files from very small up to over 3 Gb >> in length. Much of my testing has been done with files in the 200 to 300 Mb >> size, and the program works fine at that size. >> >> However, when I move up files in the 2 to 4 Gb range, behavior changes. The >> program starts consuming great amounts of virtual memory, around 14 Gb, >> takes more than a half hour to run, and after the functional part of the >> program is over, it takes another half hour for the program to give back >> much of the virtual memory, and once the program does fully quit, it takes >> the operating system another 10 minutes or so of thrashing before the final >> amount of virtual memory is returned and the hard drive finally calms down. >> >> I've never processed such massive files before, but I am surprised by the >> behavior. As you will see I'm using memory mapped NSData, and once I start >> processing the data I simply proceed through the data from beginning to end, >> separating the data into newline-separated lines and processing the lines. >> That processing is simple, just breaking each line into vertical-bar >> separated fields, and putting some of those field values into dictionaries. >> >> If I am simply reading through memory mapped data like this, why does the >> program use about six times as much virtual memory as the amount of memory >> needed by the file itself; why does the virtual memory accumulate in the >> first place, since I never return to memory pages I have already read >> through, and why does it take three quarters of an hour for the system to >> calm down once again after the processing has finished. > > You should use the Allocations instrument to see what is hogging your memory. > > My guess is that the memory-mapped NSData is fine, but that your NSString and > other code inside processLine() is allocating objects and not freeing them. > > One simple possibility is that you are creating lots autoreleased objects, > but not cleaning up any autorelease pools so they don't get deallocated until > you are all done. Try this: > > while (YES) { > @autoreleasepool { > if (start >= length) break; > while (end < length && bytes[end] != '\n') { > end++; > } > line = [[NSString alloc] initWithBytes: bytes + start length: end - > start encoding: 4]; > processLine(line); > start = end + 1; > end = start; > } > } > > (Also, if you are not using ARC then that NSString is leaking, which will > also cost lots of memory.) > > > -- > Greg Parker gpar...@apple.com Runtime Wrangler > > _______________________________________________ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com