On Fri, May 16, 2025 at 05:26:25PM -0400, Kurt Hackenberg wrote:
> Nope, sorry. RFC 4155 has a problem. Its default format, the only one it
> defines, defines the From_ line rigidly, forbids ">From " escaping, and does
> not use a length. It says messages should be found by recognizing the whole
> From_ line, with exact syntax.
> 
> That fails when the message body includes such a From_ line, as it might
> when people use email to discuss mbox format, as here. Like this:
> 
> From nobody@nowhere.invalid Thu Jan  1 00:00:00 1970
> 
> An RFC 4155 reader would take that line above as the beginning of a new
> message, and would fail to read the rest of this message.

I agree that this is a problem.  But I don't agree with the (elided for
brevity) suggestion that we should collectively ignore the RFC, because I
don't think that serves the future well.  There are, and will be, projects
that will rely on the documentation (which at the point, for better or worse,
is RFC 4155, an archived web page mentioned elsewhere in this thread [1], and
some scattered notes) long after mutt is gone.  (Why?  Because there
are enormous repositories of email in mbox format, along with equally
enormous repositories of Usenet news articles that have been saved
in mbox format.  I'm directly aware of one project tackling a corpus
of ~400M messages, and indirectly aware of others doing similar things.
And more repositories are being created all the time: e.g. this mailing
list is run with Mailman, whose primary message store is in mbox format.)

I think a better approach is to figure out what needs to be changed/
fixed/added to RFC 4155 so that it covers the variants that arisen,
and to create a superseding RFC that updates it.  This won't fix all
the problems that have arisen as a result of choices (or mistakes)
made along the way, but it should at least document those problems
so that folks have a fighting chance of dealing with them.  I think
it'd be good to have a single definitive-as-possible reference stored
somewhere that's like to be around for a while, and well, an IETF standard
is about as "permanent" as we're likely to get.

And yes, I realize that I'm about to be volunteered for this.  This
is not my first day on the job. ;)

And then someone(s) will need to look at formail(1), grepmail(1), and
other mail tools to try to figure out what works/breaks with what.
I'm already staring at grepmail for other reasons, so I'll make a note
to circle back to this issue.

---rsk

[1] It's this page:

        "mbox" is a family of several mutually incompatible mailbox formats.
        
https://web.archive.org/web/20160423115957/http://homepage.ntlworld.com./jonathan.deboynepollard/FGA/mail-mbox-formats.html

Reply via email to