Victor Duchovni:
> > > > How would one decide that a (message-id) header is not mangled?
> > > > This would require parsing the string, counting the "address"
> > > > tokens, and if there is only one "address" token, use that as the
> > > > logged message ID, otherwise log the entire original string.
> > >
> > > Real-life examples include:
> > >
> > > Message-Id: News_03/11/2008 16:11:15_PR Newswire Brasil<[EMAIL PROTECTED]>
> > > Message-ID: <42M0XSEC17ENNJN27.1103.753798 @lowbehold.com>
> > > Message-ID: <2008-11-07 10:43:57 TheSystem@>
> > > Message-ID: <[EMAIL PROTECTED] & Cloppenburg Website>
> > > Message-Id: <[EMAIL PROTECTED] >
> > >
> > > So the "address" token parser would have to be fairly "liberal".
> >
> > I'm not sure if cosmetic concerns about Message-ID logging alone
> > would justify the implementation of another RFC822 parser.
>
> The concerns are not entirely cosmetic, as some folks are contemplating
> pulling logs into structured databases, and indexing on message-id,
> queue-id, and so on. Do we want the log parsers to parse the raw header
> value, or should we try to "help" by trimming comments, leaving just
> the "real" message-id?
Even with comments removed, your logfile processor would still need
to use heuristics for dealing with malformed Message-ID strings.
Compared to such heuristics, stripping off the (text) seems trivial.
I definitely don't want yet another RFC822 parser just for the
purpose of Message-ID logging. So, it would have to be done with
the existing RFC822 scanner/unparser, which does not preserve
whitespace that isn't supposed to be there.
Wietse