btw guys, note that hit-frequencies can also produce rule-overlap reports using
the "-o" switch....

--j.

On Tue, May 26, 2009 at 00:57, Mandy <messaging.director...@gmail.com> wrote:
> On Fri, May 22, 2009 at 9:06 PM, Henrik K <h...@hege.li> wrote:
>> On Fri, May 22, 2009 at 09:28:55PM +0200, Karsten Bräckelmann wrote:
>>> > The EmailBL test zone period has been extended to July 1st.
>
> [snip]
>
>> Thanks. And this is just a small scale test. If we used more domains, feeds,
>> and submissions, it could be even nicer. ;-) Keep the reports coming in. It
>> would be nice to also know how much of spam are generally from freemails, so
>> FREEMAIL_FROM/BODY/REPLYTO figures would be nice also when reporting. It
>> might differ from user to user.
>
> I just spent some time putting together some stats.  I'm going to try
> to follow the excellent lead of Karsten, and provide some overlap
> figures based on the cool grep formula that Dan Mcdonald showed.  The
> short version is that it hits about 12% of spam scoring under 15.
>
> The time period is somewhat short: May 22 to May 25.  It's a little
> inaccurate too, due to 12 hours of extra mail in the May 22 side
> because I implemented at noon, but...
>
> As I mentioned before, this is from a mid-sized install of Canadian
> government & education users (somewhere around 100 000 mailboxes).  SA
> only sees a filtered mail-stream in my setup -- to give an idea how
> filtered, 75% of the mail that SA sees is classified as ham.  The
> totals volumes were 192 530 Spam, 564 483 Ham.
>
>
> 24.5% of the spam that's tagged is between 5 & 10 score.
> 2.76% of that mail hit EMAILBL_TEST_LEM.
> 0.95% hit FREEMAIL_REPLYTO
>
> 22.9% of the spam that's tagged is between 10 and 15.
> 8.97% of that mail hit EMAILBL_TEST_LEM.
> 1.20% hit FREEMAIL_REPLYTO
>
> 52.5% of the spam that's tagged is above 15.
> 21.41% of that mail hit EMAILBL_TEST_LEM.
> 2.36% hit FREEMAIL_REPLYTO
>
> I also saw 0.05% hits of EMAILBL_TEST_LEM on mail classified as ham.
> I hand-verified the 35 messages of 299 that weren't obvious spam.
> About 9 of those were FPs (and those came down to 3 distinct messages
> from lists I sure wouldn't choose to be on).  I can provide them
> off-list if desired.
>
> I saw even fewer FREEMAIL_REPLYTO hits on mail classified as ham.  56,
> or 0.01%.  About 22 of those (based on subject line -- sorry it's the
> end of the day) look legit.
>
> Here are the overlap numbers for mail with score less than 10:
> $ grep EMAILBL_TEST_LEM spamd_since_22nd | perl -ne 'if (/spamd:
> result: Y (\d+)/) { print if $1 <= 10 }' | cut -d' ' -f11 | egrep -o
> '[A-Z0-9_:\.]+?,' | sort | uniq -c | sort -rn | head -n15
>   1304 EMAILBL_TEST_LEM,
>    728 RAZOR2_CHECK,
>    643 RAZOR2_CF_RANGE_51_100,
>    629 RAZOR2_CF_RANGE_E4_51_100,
>    612 BAYES_50,
>    590 FORGED_YAHOO_RCVD,
>    582 BAYES_99,
>    282 HTML_MESSAGE,
>    199 FREEMAIL_FROM,
>    157 ADVANCE_FEE_2,
>    132 FORGED_MUA_OUTLOOK,
>    114 FREEMAIL_REPLYTO,
>    103 RCVD_IN_BRBL,
>     72 SPF_PASS,
>
> And here they are for all hits on EMAILBL_TEST_LEM:
> $ grep EMAILBL_TEST_LEM spamd_since_22nd | cut -d' ' -f11 | egrep -o
> '[A-Z0-9_:\.]+?,' | sort | uniq -c | sort -rn | head -n15
>  41503 EMAILBL_TEST_LEM,
>  38987 BAYES_99,
>  36782 FORGED_MUA_OUTLOOK,
>  36028 ADVANCE_FEE_2,
>  33746 RCVD_IN_BRBL,
>  33506 JM_SOUGHT_FRAUD_3,
>  33214 JM_SOUGHT_FRAUD_2,
>  33186 HTML_MESSAGE,
>  32281 RCVD_IN_BL_SPAMCOP_NET,
>  31953 JM_SOUGHT_FRAUD_1,
>  31914 RDNS_NONE,
>  31893 RCVD_IN_SBL,
>  31883 MIME_HTML_ONLY,
>
> Phew.  Hopefully those numbers are useful.
>
>

Reply via email to