"Jeremy M. Dolan" <[EMAIL PROTECTED]> writes:
...
>While setting up procmail, I ran into trouble feeding it large
>mbox's that were written by mutt. I first suspected it was a problem
>with procmail/mutt/pine's handling of From_ lines in MIME boundaries.
>the included file is a smaller test mbox, without any multipart
>messages. However the attached mbox is quite isolated and has no MIME
>multiparts.
..
>Reading this file produces strange results:
>
>grep "^From " file - shows 4 messages
>mutt -f file - shows the first message, contains the whole file
>pine -f file - shows all 4 messages
>procmail < file - sees the whole file as 1 message, like mutt
>hexdump -c file - shows it as normal mbox format, \nFrom, no \r's
...
Okay, the _real_ problem is that the message you're receiving contain
bogus Content-Length: header fields. One of the variations on mbox
format says that the number of bytes specified by the Content-Length:
header field should be taken verbatim and not scanned for From_ message
separators. At sites where this variation is used, the Content-Length
field should be set to the correct value by the Local Delivery Agent.
procmail will do so, unless it is invoked with teh -Y flag, which tells
it to ignore the Content-Length: field and always escape embedded From_
lines. The vast majority of sites do so (check your .forward file or
the sendmail.cf, or wherever procmail is invoked), so that CL: fields
in incoming messages will _not_ be updated to reflect the actual message
size -- you told procmail to ignore them, and boy, does it ever.
So, let's consider the first message in the mailbox. It's body is ~1450
bytes long, while its CL: field contains the value 11267. Mutt comes
along, decides to pay attention to CL: field, and sees the body of the
first message as encompassing the next three message. At some point
in the third it resumes scanning for a From_ line and correct splits of
the following message. Pine, on the otherhand, ignores the CL: field,
splits on the From_ lines, and when it rewrites the mailbox it drops
the CL: field, thus letting mutt correctly split the message later on.
So, you need to either a) tell mutt to always ignore CL: fields and only
split on From_ lines, or b) filter out CL: fields in your .procmailrc.
I don't know how to do (a), but (b) is as simple as:
:0 fhw
* ^Content-Length:
| formail -I Content-Length:
You may wonder at this point why procmail considered the file as a single
message when you executed "procmail < file". Well, procmail _always_
considers its input to be a single message. If you want to split a mbox
format mailbox into multiple messages you need to use formail -s flag to
invoke procmail once for each message. Note that like procmail, formail
will by default pay attention to CL: fields, so you should include the
-Y flag on formail's command line when splitting mailboxes that may
contain bogus CL: fields:
formail -Y -s procmail <file
Note that a quick way to strip the CL: field from every message in
a mailbox is to give formail the "delete this header" argument while
splitting:
formail -Y -I Content-Length: -s <file >file.new
(If you don't specify a program to invoke on each message when splitting,
formail will just send the message to its stdout.)
Does that all make sense?
Philip Guenther