Victor Duchovni:
> > > > How would one decide that a (message-id) header is not mangled?
> > > > This would require parsing the string, counting the "address"
> > > > tokens, and if there is only one "address" token, use that as the
> > > > logged message ID, otherwise log the entire original string.
> > > 
> > > Real-life examples include:
> > > 
> > > Message-Id: News_03/11/2008 16:11:15_PR Newswire Brasil<[EMAIL PROTECTED]>
> > > Message-ID: <42M0XSEC17ENNJN27.1103.753798 @lowbehold.com>
> > > Message-ID: <2008-11-07 10:43:57 TheSystem@>
> > > Message-ID: <[EMAIL PROTECTED] &amp; Cloppenburg Website>
> > > Message-Id: <[EMAIL PROTECTED] >
> > > 
> > > So the "address" token parser would have to be fairly "liberal".
> > 
> > I'm not sure if cosmetic concerns about Message-ID logging alone
> > would justify the implementation of another RFC822 parser.
> 
> The concerns are not entirely cosmetic, as some folks are contemplating
> pulling logs into structured databases, and indexing on message-id,
> queue-id, and so on. Do we want the log parsers to parse the raw header
> value, or should we try to "help" by trimming comments, leaving just
> the "real" message-id?

Even with comments removed, your logfile processor would still need
to use heuristics for dealing with malformed Message-ID strings.
Compared to such heuristics, stripping off the (text) seems trivial.

I definitely don't want yet another RFC822 parser just for the
purpose of Message-ID logging.  So, it would have to be done with
the existing RFC822 scanner/unparser, which does not preserve
whitespace that isn't supposed to be there.

        Wietse

Reply via email to