I'm looking to write a utility to do some processing on email messages stored in mbox format. Some mbox files can be quite large, hundreds of megs or perhaps gigs in size. Obviously, reading in the whole file at once isn't feasible. The most obvious method is to set $/ to the regex /\n\nFrom / (messages in mbox format are seperated by a blank line and begin with a From line) and to read in email messages one at a time. It seems to me that this would be quite slow. Another possibility that springs to mind is to read in chunks 64k or so chunks of data and then split those chunks into individual messages. This will complicate the program logic, however, as the chunks will inevitably split the last message in two. I'd then either have to back up the offset into the file to point to the begging of the message or to store the beginning of the message, read in a new chunk, get the last half of the message off the new chunk, combine it with the stored beginning of the message, then process it.
I'm aware that there are a number of modules which deal with mail and mbox handling, but so far none of them seem to make doing what I'm trying to do easy. Reinventing the wheel isn't always a waste of time - it's sometimes a very good way to learn how wheels are constructed and how to use your tools to construct wheels. This gives you insight and practice when you have to use those same tools to construct non-wheels. :) Any thoughts or pointers to discussions on how to handle large files in Perl would be welcome. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>