On 23/07/14 19:54, Amir 'CG' Caspi wrote:

Care to share?  Counting encoded chars is easy, of course.

I use the following to count the encoded chars:

body     __LOC_COUNT_UNI /x[0-9A-F]{4};/
tflags   __LOC_COUNT_UNI multiple

We can make some vars if we want:

meta __LOC_HAS_0_UNI (__PDS_COUNT_UNI == 0)
meta __LOC_HAS_10_UNI (__PDS_COUNT_UNI >= 10)

I've noticed that they all come through as VERP emails -

header          __LOC_VERP         X-Envelope-From =~ 
/\=.*\.(com|net|org|biz)\@/

And a list of keywords that I've noticed:

header          __LOC_VERP_AMAZON       X-Envelope-From =~ 
/^amazon\-?_?coupons\-/i

Then add them together in a meta score

meta LOC_UNI_SPAM (!BAYES_00) && ( __LOC_VERP + __LOC_VERP_AMAZON + 
__LOC_HAS_10_UNI >= 3)
score LOC_UNI_SPAM 0.001

This seems to only be catching the bad stuff, you could of course add some more 
magic:

meta LOC_UNI_SPAM_99 (BAYES_99 && LOC_UNI_SPAM)
score LOC_UNI_SPAM_99 .........

...checking whether the MIME-encoding is text/plain may not be sufficient

Though it's totally possible, I haven't gone as far as checking the encoding 
types etc, apart from the links to the patches I included...

SA v3.3.x ...
Me too, the patch works fine with it, I'm awaiting the Debian build for the 
production boxes, but running from source isn't too difficult either.

Though I'm aware they're not the best for generic spam, they're seem okay on 
these specific types (I suggest from the same source, looking at the styles of 
the email) - I've yet to test the rules on production.

I've also noticed the following traits but not sure how to find these traits:

* All emails have a message ID where the recipients email address is contained in md5 - 
<md5(recipient)>.<shorterhash>@domain.com
* All emails to the same recipient have the same MIME boundary - possibly a 
hash of the recipient address

Paul

--
Paul Stead, Zen Internet
Systems Engineer

Reply via email to