Victor Duchovni: > > > > How would one decide that a (message-id) header is not mangled? > > > > This would require parsing the string, counting the "address" > > > > tokens, and if there is only one "address" token, use that as the > > > > logged message ID, otherwise log the entire original string. > > > > > > Real-life examples include: > > > > > > Message-Id: News_03/11/2008 16:11:15_PR Newswire Brasil<[EMAIL PROTECTED]> > > > Message-ID: <42M0XSEC17ENNJN27.1103.753798 @lowbehold.com> > > > Message-ID: <2008-11-07 10:43:57 TheSystem@> > > > Message-ID: <[EMAIL PROTECTED] & Cloppenburg Website> > > > Message-Id: <[EMAIL PROTECTED] > > > > > > > So the "address" token parser would have to be fairly "liberal". > > > > I'm not sure if cosmetic concerns about Message-ID logging alone > > would justify the implementation of another RFC822 parser. > > The concerns are not entirely cosmetic, as some folks are contemplating > pulling logs into structured databases, and indexing on message-id, > queue-id, and so on. Do we want the log parsers to parse the raw header > value, or should we try to "help" by trimming comments, leaving just > the "real" message-id?
Even with comments removed, your logfile processor would still need to use heuristics for dealing with malformed Message-ID strings. Compared to such heuristics, stripping off the (text) seems trivial. I definitely don't want yet another RFC822 parser just for the purpose of Message-ID logging. So, it would have to be done with the existing RFC822 scanner/unparser, which does not preserve whitespace that isn't supposed to be there. Wietse