On 14/05/25, Kurt Hackenberg (k...@panix.com) wrote:
> On Wed, May 14, 2025 at 02:01:11PM +0100, Rory Campbell-Lange wrote:
> 
> > I believe that mutt uses mboxcl2 format for writing new mailboxes. I'd
> > be grateful to know if that is corrrect.
> 
> I think that's right, with a small addition: since version 1.9.5 (April 2018),
> Mutt has also written Lines: along with Content-Length:.
> 
> I haven't read Mutt's code that writes mbox; I've only looked at its output,
> with a little testing.
> 
> For completeness: mboxcl2 means that Mutt adds the header Content-Length:
> and does *not* do ">From " escaping of message lines. And these days Mutt
> also throws in the header Lines:.
> 
> (The value of Content-Length: is the number of bytes in the body of the
> message; the value of Lines: is the number of lines in the body of the
> message. Mutt gets both those lengths right, by what I think they should be.
> (Not all software does.))
> 
> Both headers are non-standard -- they're not RFC 822, they were invented to
> work around mbox's deficiencies. Effectively those headers are part of mbox
> file format.

This is all really helpful information, which I didn't know. Thank you very
much for providing it.

> You probably know that Mutt can write new mailboxes in any of the four file
> formats that it knows. You can use Mutt to convert among those formats.

I didn't know that. I see on the man page mutt can save as mbox, MMDF, MH or
Maildir using the -m flag. I couldn't see any docs about saving in the
different mbox formats, which I understand from
https://docs.aspose.com/email/net/email-storage-formats/ to be as follows:

    * MBOXO: 
        The original format where “From " lines in the email body are quoted
        with a > character.

    * MBOXRD:
        A variant of MBOXO that further extends the quoting method of “From "
        lines.

    * MBOXCL:
        Introduced by the “Classic” MBOX variant where each “From " line is
        quoted with an ffrom string.

    * MBOXCL2:
        A variation of MBOXCL where “From " lines are doubled to distinguish
        them.

> > It would be helpful also to know how long that has been the case since
> > I've got some 20 year old mutt mboxes I'm keen to process with a golang
> > program.
> 
> I don't know, haven't used Mutt for that long.

>From the source in git it seems that Thomas Roessler committed the first
man pages for mutt in 1998 and the mbox file format in 2000. There is no
mention of MBOXCL format in the mbox man page until Urs Janßen's commit of
April 2004; so just over 20 years ago, so I assume I've been by default using
that format since a bit after that (given Debian's slowish release cycle).

I also realise that while I may need to be aware of the mbox format when
parsing emails, (or massaging golang's net/mail module's output) I should
simply be looking to separate emails in mboxes by what the mbox man page in
docs calls "the postmark line". Curiously none of the mbox parsers I've been
using take that approach.

I'm going to start with the postmark line (I can't really read C but is_from in
from.c makes interesting reading.) I'll then work to decode the different mbox
types, starting with MBOXCL/MBOXCL2, as needed. I'm not going to attempt to
take on MMDF and MH format, but I've already got a basic Maildir format parser
going, which is thankfully pretty easy.

Thanks again for the very useful pointers,
Rory

Reply via email to