On Thu, Mar 21, 2002 at 10:58:35AM -0500, Greg Ward wrote:
> I think "message id with no dot after the @" is worth detecting, but
> with a low positive score -- that sort of thing occurs depressingly
> often in real email too.

What I'm arguing is: don't replace the regexp of INVALID_MSGID with one
that isn't actually checking for invalid message-ids.

Adding a new test that looks for ".+@.+\..+" would be fine by me, since it's
a new test that looks for something different.


Just for some stats BTW:

I ran a quick script against my current montly spam archive:

$ perl -nle 'BEGIN{$/="\n\nFrom "} next unless /^Message-Id/mi;
/Message-Id:.*?<(.+)>/i;$_=$1; s/.+?@//; print; if (/\./){$with++}elsif
(/\S/){$without++}else{$blank++}; $total++; END{print
"$with/$without/$blank/$total/",($without/$total),"%/",($blank/$total),"%\n"}'
spammers


This month so far comes back with:

743/104/1/848/0.122641509433962%/0.00117924528301887%

(with a dot/without a dot/blank RHS/total/pct without/pct blank)


Now, my spam archive doesn't have many (I'm surprised I have any -- I
think it's the one with random 8-bit chars in it) blanks since I filter
things that aren't ".+@.+" at the SMTP level.  So I did a quick grep on
the mail logs for the past month:

$ grep CheckMessageId maillog* | wc -l
135

Assuming these mean "blanks" (some are completely invalid message-ids that
look like dates, etc,) that makes the new stats:

743/104/136/983/0.105798575788403%/0.138351983723296%


So of the 983 known and very likely spam messages, 10.5% don't have a dot on
the right-hand side, and 13.8% are invalid via ".+@.+".


-- 
Randomly Generated Tagline:
"You have to stay in shape.  My grandmother, she started walking 5 miles a
 day when she was 60.  She's 97 today and we don't know where the hell she 
 is."                     - Ellen DeGeneres

_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to