On 8/31/2014 2:21 AM, Reindl Harald wrote:

Am 31.08.2014 um 02:15 schrieb Ted Mittelstaedt:
Yes, it does work great when you have the bayes filter turned on and you take 
the time to feed it.  And that means
you have to feed the
learner both ham and spam and setup reliable sources for those.

Unfortunately if Bayes is not turned on, it does not catch more than
around 60-70% of spam.  As a Spamassassin user&  server admin, I would
really like to see that improve.

60-70% without training is great

keep in mind that the first 90% of incoming is eaten by RBL's
and the 60% are from the remaining 10% at all :-)

i think it's impossible to improve that much "out-of-the-box" because
that would make it to sensitive while the bayes has the ham side of
your communication too for decisions


Google does it.  It's not impossible.

i am coming from a commercial device trying to block 100% and there
it ends in zero-hour-blocklists with domains even if they are only
linked on the youtube page of the blocked facebook notification

so i am glad that i have to do soem training by myself instead fear
of false positives which do much more harm


My experience is that the commercial providers like Gmail are now
so aggressive that false positives are VERY common on their systems,
this leads to people nowadays quite commonly saying "check your
spam folder" on their websites and such that send feedback messages.

Out of the box the default decision point of 5 is too high anyway.

I think the emphasis on avoiding false positives in the stock
(non-Bayes) distribution is far too high.  I suspect that over
the years many good rule submissions have been ignored because
incidence of false positives with them was too high for the
SA maintainers.

For a newbie to SA it is disheartening to install SA and not
get 90% with a 2% false positive, out of the box, but rather get
50% with a 0% false positive.  And I think that is a mistake the
maintainers are making is over-reliance on bayes.

At the least the SA maintainers should maintain a separate
"highly aggressive" rule distro that was optional that would
give us a much higher success rate with a corresponding
slight increase in false positives.

Their design approach has been to rely on Bayes to be trained to go from 50% capture out of box with 0% FP to 80-90% capture with 0% FP.

But, the design approach could easily be relying on Bayes to go
from 90% capture with 5% FP out of the box, to 90% capture with
0% FP with Bayes, and the emphasis being on training Bayes on ham,
not spam.

Note I am pulling the percentages out of my ass, but I think you
get the idea.

Ted

On 8/30/2014 2:41 PM, Reindl Harald wrote:
after two days running SA for the first two test-domains with a
well trained bayes for the global milter-user: impressive!

the few crap making it through poscreen RBL scroing is detected

0.000          0          3          0  non-token data: bayes db version
0.000          0       1389          0  non-token data: nspam
0.000          0       1350          0  non-token data: nham
0.000          0     257152          0  non-token data: ntokens

Aug 30 23:34:19 localhost spamd[4882]: spamd: identified spam (8.9/4.5) for 
sa-milt:189 in 0.6 seconds, 2454 bytes.
Aug 30 23:34:19 localhost spamd[4882]: spamd: result: Y 8 -
BAYES_80,CUST_DNSBL_15,CUST_DNSWL_2,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,FREEMAIL_REPLYTO,FREEMAIL_REPLYTO_END_DIGIT,HTML_MESSAGE,MALFORMED_FREEMAIL,MISSING_HEADERS,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,REPLYTO_WITHOUT_TO_CC,RP_MATCHES_RCVD,SPF_PASS

scantime=0.6,size=2454,user=sa-milt,uid=189,required_score=4.5,rhost=localhost,raddr=127.0.0.1,rport=51671,mid=<[email protected]>,bayes=0.842503,autolearn=disabled

Aug 30 23:34:19 localhost postfix/cleanup[6195]: 3hlrXp5S3dz1w: milter-reject: 
END-OF-MESSAGE from
snt004-omc1s37.hotmail.com[65.55.90.48]: 5.7.1 Blocked by SpamAssassin; 
from=<[email protected]>   to=<***>

Reply via email to