On 05/25/2010 12:14 AM, Adam Katz wrote:
My original rule:
header SINGLE_HEADER_2K ALL:raw =~ /^(?=.{2048,3071}$)/m
Karsten Bräckelmann noted:
It does not match a single header, let alone a *specific*
header as the one mentioned, but ALL headers. It effectively
checks the entire headers' size.
Karsten then corrected himself:
Err, nope -- the size between the beginning and end of a line.
Yup, my test was a single-line header. Fixed.
header SINGLE_HEADER_2K ALL:raw =~
/(?-xim:(?=(?:^|\n)[^\s\n]+:(?:.(?!\n\S)){2048,3071}.(?:\n\S|$)))/s
Perhaps a regexp efficiency expert should clean it up ... the large
match in the middle using "(?:.(?!\n\S)){2048,3071}" to keep within a
single header might not be so hot on the PCRE parser; that's a LOT of
looking ahead. Maybe "(?!.{0,2048}\n\S).{2048}" and then use meta
rules to exclude larger hits?
Being the one credited with suggesting it, I would rather just look
at the X-Ymail-OSG header. I can EASILY get my MTA to block (at the
gateway) any email with a random header> xxxxx in size.
if X-Ymail-OSG is> 1024 bytes, its just about guaranteed to be
spam.
Yes, I just wanted to see what examining /any/ header for that kind of
thing would look like. I've add tests specific to that so we don't
get bogged down waiting for results.
header MS_XYMOSG_1K X-YMail-OSG =~ /^(?=.{1024,2047}$)/s
header MS_XYMOSG_2K X-YMail-OSG =~ /^(?=.{2048,3071}$)/s
header MS_XYMOSG_3K X-YMail-OSG =~ /^(?=.{3072,4095}$)/s
header MS_XYMOSG_4K X-YMail-OSG =~ /^(?=.{4096,5119}$)/s
header MS_XYMOSG_5K X-YMail-OSG =~ /^(?=.{4096})/s
(I fully expect these to all fold into one or two rules, but it's nice
to see where things sit beforehand.)
Committed revision 947854.
I've just noticed false positives on the SINGLE_HEADER_2K rule that hits
against *any* single header containing 2K-3K characters.
In my case it appears to hit against the "To:" header when a user is
sending a mail to a "distribution" list of many users (around 90
addresses in this case). I'd rather not post up the example message to
pastebin as it obviously contains 90 valid email addresses :-)
The score also seems to have jumped up to 4.399 in the latest rule
update - large for something that can FP IMHO.
Can we re-evaluate how useful this is, or maybe exclude To: and CC: headers?