I need to process very large files, and below is the program I use to do the 
work. I have run this program on data files from very small up to over 3 Gb in 
length. Much of my testing has been done with files in the 200 to 300 Mb size, 
and the program works fine at that size.

However, when I move up files in the 2 to 4 Gb range, behavior changes. The 
program starts consuming great amounts of virtual memory, around 14 Gb, takes 
more than a half hour to run, and after the functional part of the program is 
over, it takes another half hour for the program to give back much of the 
virtual memory, and once the program does fully quit, it takes the operating 
system another 10 minutes or so of thrashing before the final amount of virtual 
memory is returned and the hard drive finally calms down.

I've never processed such massive files before, but I am surprised by the 
behavior. As you will see I'm using memory mapped NSData, and once I start 
processing the data I simply proceed through the data from beginning to end, 
separating the data into newline-separated lines and processing the lines. That 
processing is simple, just breaking each line into vertical-bar separated 
fields, and putting some of those field values into dictionaries.

If I am simply reading through memory mapped data like this, why does the 
program use about six times as much virtual memory as the amount of memory 
needed by the file itself; why does the virtual memory accumulate in the first 
place, since I never return to memory pages I have already read through, and 
why does it take three quarters of an hour for the system to calm down once 
again after the processing has finished.

I hope someone with some experience dealing with very large files might see 
something pretty silly in this code and have a pointer of two to share.

Thanks,

Tom Wetmore,
Chief Bottle Washer, DeadEnds Software
------------------------------------------------


#import <Foundation/Foundation.h>

static void processLine (NSString*);

int main(int argc, const char * argv[])
{
    @autoreleasepool {

        NSError* error;
        NSString* path = @"/Volumes/Iomega HDD/Data/data";
        NSData* data = [NSData dataWithContentsOfFile: path
                                              options: 
NSDataReadingMappedAlways + NSDataReadingUncached
                                                error: &error];
        NSUInteger length = [data length];
        const Byte* bytes = [data bytes];

        NSUInteger start = 0;
        NSUInteger end = 0;
        NSString* line;
        while (YES) {
            if (start >= length) break;
            while (end < length && bytes[end] != '\n') {
                end++;
            }
            line = [[NSString alloc] initWithBytes: bytes + start length: end - 
start encoding: 4];
            processLine(line);
            start = end + 1;
            end = start;
        }
    }
    return 0;
}

void processLine (NSString* line)
{
        ... break link into 74 vertical-bar separated fields ... and do simple 
things
}


_______________________________________________

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Reply via email to