Am 31.08.2014 um 16:08 schrieb Ted Mittelstaedt: > On 8/31/2014 2:21 AM, Reindl Harald wrote: >> >> Am 31.08.2014 um 02:15 schrieb Ted Mittelstaedt: >>> Yes, it does work great when you have the bayes filter turned on and you >>> take the time to feed it. And that means >>> you have to feed the >>> learner both ham and spam and setup reliable sources for those. >>> >>> Unfortunately if Bayes is not turned on, it does not catch more than >>> around 60-70% of spam. As a Spamassassin user& server admin, I would >>> really like to see that improve. >> >> 60-70% without training is great >> >> keep in mind that the first 90% of incoming is eaten by RBL's >> and the 60% are from the remaining 10% at all :-) >> >> i think it's impossible to improve that much "out-of-the-box" because >> that would make it to sensitive while the bayes has the ham side of >> your communication too for decisions >> > > Google does it. It's not impossible.
Google has a lot of more data and power to feed a global bayes and even then: they fail as you say yourself in the next paragraph i don't care for the 5 spam messages i care for the eaten important one >> i am coming from a commercial device trying to block 100% and there >> it ends in zero-hour-blocklists with domains even if they are only >> linked on the youtube page of the blocked facebook notification >> >> so i am glad that i have to do soem training by myself instead fear >> of false positives which do much more harm > > My experience is that the commercial providers like Gmail are now > so aggressive that false positives are VERY common on their systems, > this leads to people nowadays quite commonly saying "check your > spam folder" on their websites and such that send feedback messages. which defeats the intention of a spamfilter and the whole idea of a junk-folder is broken - i need a contenfilter running relieable before-queue to not see the real crap and some [SPAM] tagged messages which are hand-move to ham/spam for train bayes > Out of the box the default decision point of 5 is too high anyway. > > I think the emphasis on avoiding false positives in the stock > (non-Bayes) distribution is far too high. I suspect that over > the years many good rule submissions have been ignored because > incidence of false positives with them was too high for the > SA maintainers. if you have users to support there is nothing more bad than a false positive - 10 slipped junk mails are not that worse as having a user complaining that ge don't get legit mail and is tired of try to explain his customers how the could make it through the filter > For a newbie to SA it is disheartening to install SA and not > get 90% with a 2% false positive, out of the box, but rather get > 50% with a 0% false positive. And I think that is a mistake the > maintainers are making is over-reliance on bayes. no - as i showed in another thread that day the opposite is true the bayes could and should have more impact but that can't be default values because no software can know how good the bayes data (ham and spam) are really and if it is trained by a noob fire any newsletter into "spam" it makes damage - mine is trustable because i know what i am doing in that context the most important thing in train a bayes is to know what messages you should strongly avoid to feed in > At the least the SA maintainers should maintain a separate > "highly aggressive" rule distro that was optional that would > give us a much higher success rate with a corresponding > slight increase in false positives. here i agree - maybe with a meta-rule or such which have it's own score in "local.cf" - but i still think you need to know what you are doing because such meta value also makes compromises and in my case i trust my base nearly unconditional but would not have other default rules with the same power > Their design approach has been to rely on Bayes to be trained to go from 50% > capture out of box with 0% FP to 80-90% capture with 0% FP. easy spoken words spammer are not dumb and follow SA updates too how long do you think would such a default survive in the wild? > But, the design approach could easily be relying on Bayes to go > from 90% capture with 5% FP out of the box, to 90% capture with > 0% FP with Bayes, and the emphasis being on training Bayes on ham, > not spam. 5% false positives out of the box is just inacceptable the contentfilter anyways should be only the last defense and your 90% spam eaten by postscreen and DNSBL scores combined with postfix-PTR-regex reject dailup networks only with the PTR check you get rid of around 80% of botnet junk without anything else > Note I am pulling the percentages out of my ass, but I > think you get the idea. i get the idea and a few years ago a thought the same way but looking what support times angry customers not get important mail (including myself) wasted and how less time it takes for each user to just delete his 10 daily spam never face the other thounsands already blocked my attitude in that context changed dramatically that's also why postscreen with a lot of RBL's combined with differernt weighted DNSWL's to not allow a single RBL by mistake do damage like block large providers like GMX/Web.de (United Internet) not so long ago i am a new SA user built up a complete mailfilter system the last few weeks but with some years expierience from other systems what i see here at least over the weekend is the result below and says clearly "rely on a contentfilter only as last defense for several reasons" SA is very expensive (connection time, resources), postscreen is for free and don't eat a single smtpd process most of the time [root@localhost:~]$ cat maillog | grep "CONNECT from" | wc -l 1940 [root@localhost:~]$ cat maillog | grep "NOQUEUE" | grep postscreen | wc -l 1584 [root@localhost:~]$ cat maillog | grep "relay=" | wc -l 286 [root@localhost:~]$ cat maillog | grep "SpamAssassin" | wc -l 58 [root@localhost:~]$ cat maillog | grep "cannot find your reverse hostname" | wc -l 12 >>> On 8/30/2014 2:41 PM, Reindl Harald wrote: >>>> after two days running SA for the first two test-domains with a >>>> well trained bayes for the global milter-user: impressive! >>>> >>>> the few crap making it through poscreen RBL scroing is detected >>>> >>>> 0.000 0 3 0 non-token data: bayes db version >>>> 0.000 0 1389 0 non-token data: nspam >>>> 0.000 0 1350 0 non-token data: nham >>>> 0.000 0 257152 0 non-token data: ntokens >>>> >>>> Aug 30 23:34:19 localhost spamd[4882]: spamd: identified spam (8.9/4.5) >>>> for sa-milt:189 in 0.6 seconds, 2454 >>>> bytes. >>>> Aug 30 23:34:19 localhost spamd[4882]: spamd: result: Y 8 - >>>> BAYES_80,CUST_DNSBL_15,CUST_DNSWL_2,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,FREEMAIL_REPLYTO,FREEMAIL_REPLYTO_END_DIGIT,HTML_MESSAGE,MALFORMED_FREEMAIL,MISSING_HEADERS,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,REPLYTO_WITHOUT_TO_CC,RP_MATCHES_RCVD,SPF_PASS >>>> >>>> >>>> scantime=0.6,size=2454,user=sa-milt,uid=189,required_score=4.5,rhost=localhost,raddr=127.0.0.1,rport=51671,mid=<snt152-w505982b05a6fbba5c49ad2b1...@phx.gbl>,bayes=0.842503,autolearn=disabled >>>> >>>> >>>> Aug 30 23:34:19 localhost postfix/cleanup[6195]: 3hlrXp5S3dz1w: >>>> milter-reject: END-OF-MESSAGE from >>>> snt004-omc1s37.hotmail.com[65.55.90.48]: 5.7.1 Blocked by SpamAssassin; >>>> from=<jenniferje...@hotmail.com> >>>> to=<***>
signature.asc
Description: OpenPGP digital signature