On Sun, 2009-11-15 at 03:14 -0500, Warren Togami wrote: > http://mail-archives.apache.org/mod_mbox/spamassassin-users/200910.mbox/%3c4ad11c44.9030...@redhat.com%3e > Compare this report to a similar report last month. > > http://wiki.apache.org/spamassassin/NightlyMassCheck > The results below are only as good as the data submitted by nightly > masscheck volunteers. Please join us in nightly masschecks to increase > the sample size of the corpora so we can have greater confidence in > the nightly statistics. > > http://ruleqa.spamassassin.org/20091114-r836144-n > Spam 131399 messages from 18 users > Ham 189948 messages from 18 users > > ============================ > DNSBL lastexternal by Safety > ============================ > SPAM% HAM% RANK RULE > 12.8342% 0.0021% 0.94 RCVD_IN_PSBL * > 12.3053% 0.0026% 0.94 RCVD_IN_XBL > 31.2499% 0.0827% 0.87 RCVD_IN_ANBREP_BL *2 > 80.2578% 0.1485% 0.86 RCVD_IN_PBL > 27.1836% 0.1985% 0.79 RCVD_IN_SORBS_DUL > 19.8213% 0.1785% 0.79 RCVD_IN_SEMBLACK * > 90.9360% 0.3854% 0.77 RCVD_IN_BRBL_LASTEXT > 13.0564% 0.4838% 0.67 RCVD_IN_HOSTKARMA_BL * > > Commentary: > * PSBL and XBL lead in apparent safety. > * ANBREP was added after the October report and has made a surprisingly > strong showing in this past month. ANBREP is currently unavailable to > the general public. The list owner is thinking about going public with > the list, which I would encourage because they are clearly doing > something right. It seems he would need a global network of automated > mirrors to be able to scale. He would also need listing/delisting > policy clearly stated on a web page somewhere. > * SEMBLACK consistently has been performing adequately in safety while > catching a respectable amount of spam. I personally use this > non-default blacklist. > * It is clear that the two main blacklists are Spamhaus and BRBL. The > Zen combinatoin of Spamhaus zones is extremely effective and generally > safe. BRBL has a high hit rate as well, with a moderate safety rating. > * HOSTKARMA_BL ranks dead last in safety for the past several weeks in a > row, while not being more effective against spam than PSBL, XBL or SEMBLACK. > > =============================== > HOSTKARMA_BL much better as URIBL > =============================== > SPAM% HAM% RANK RULE > 68.3651% 0.2806% 0.79 URIBL_HOSTKARMA_BL * > > Commentary: > While HOSTKARMA_BL is pretty unsafe as a plain DNSBL, it is surprisingly > effective as a URIBL. This is curious as it seems it was not designed > to be used as a URIBL. In any case as long our masschecks show good > statistics like this, I will personally use this on my own spamassassin > server. > > ========================= > SPAMCOP Dangerous? > ========================= > SPAM% HAM% RANK RULE > 17.4225% 2.6076% 0.56 RCVD_IN_BL_SPAMCOP_NET * > > Commentary: > Is Spamcop seriously this bad? It consistently has shown a high false > positive rates in these past weeks. Was it safer than this in the past > to warrant the current high score in spamassassin-3.2.5? > > Warren Togami > wtog...@redhat.com
Is it not a bit flawed to do the metrics on volunteer submissions, given the Spamhaus has is said to have a small army of them? It means the data cannot be relied upon as any kind of sensible comparison.