Andrew Bernard <andrew.bern...@mailbox.org> writes: > Well you can dynamically increase CPU or RAM or both on Digitalocean > that I use. You can do it on a temporary basis - but I'm not sure if > you get charged for a month or on a strict time basis, it's hard to > find out!. It's not a matter of needing a separate system. My only > issue is that I am very financially constrained and I can't afford the > experiment. > > But the bigger fish to fry is the issue with the irregularities in the > mbox archives. I need to study this in depth before trying a load. I > did have the same problem with similar erratic mbox archives quite > some years ago but I can't easily recall the solution. Probably just a > more refined regex to pick up the 'From:' delimiters.
There isn't really much finesse involved. Messages start at the pattern "^From ". Any "From " inside of a message that would end up at the start of a line is changed to ">From ", so the pattern "^From " should be foolproof regarding splitting into messages. I don't remember what happens to "^>From " but consider it most likely that any "^>*From " inside of a message gets one ">" prepended when put into an mbox file, and one taken out again when displayed/processed. -- David Kastrup