Autolearn is turned on. I don't think we allow users to train without review - is there a way I can confirm? We have Plesk 10 and are using SA through qmail-scanner. Even a high Bayes seems to have been mis-classified:
Jun 17 11:44:04 plesk3 spamd[18601]: spamd: result: . 3 - BAYES_99,FORGED_RELAY_MUA_TO_MX,HTML_MESSAGE scantime=5.9,size=6016,user=qscand,uid=10124,required_score=5.0,rhost=localh ost.localdomain,raddr=127.0.0.1,rport=49363,mid=<1371465833.fdyxtlmiklb@redu cetummyfatsite.com>,bayes=0.999999,autolearn=no Jun 17 11:44:04 plesk3 qmail-scanner[32763]: Clear:RC:0(174.139.0.51):SA:0(3.5/5.0): 6.00291 5948 bloomberg.businessweek.2...@spam-domain.com u...@hosted-domain.com Exclusive_Discount_Rate_-_Save_92%% <1371465833.fdyxtlmiklb@ spam-domain.com> 1371465838.32765-1.plesk3.emailitis.co.uk:3867 orig-plesk3.emailitis.co.uk137146583879732763:5948 1371465838.32765-0.plesk3.emailitis.co.uk:700 I think that it is as the Bayes training that has become corrupted over time as you say because most things suggest that we should be getting a lot less Spam than we are. /root/.spamassassin/bayes_seen is 21Mb in size so I guess trying to retrain would take ages. We have about 100 domains on the mail server. If we begin again, I know that we need to: rm /root/.spamassassin/bayes_* >From http://spamassassin.apache.org/full/3.3.x/doc/sa-learn.txt, it seems that sa-learn --clean or the above is destructive, but in your opinion is it better than having the false reporting we are suffering at present? If I do sa-learn --backup and if so, where does the backup file go? Can we review with ease and then restore or would it be a lengthy process? If we cleared, we would want to do Unsupervised learning from SA rules. We need to supplement this with supervised training to help it. When we identify Spam in the maillog as below from rules (that could be coming to any email address in any domain) is there an easy way to tell SA that they are Spam or Ham? The server is sending/receiving about 3000 emails per day. I do not know about the spamc -l switch so can you guide me on using that if that would be better? I have some Spam emails in an offline folder in Outlook that I personally have received. If we create an email address /var/qmail/mailnames/domain.com/spam, and resend those emails to it, would I run: sa-learn --spam /var/qmail/mailnames/domain.com/spam Being able to do something from the command line interface would be easier if that is possible on individual messages in the maillog. I hope you can help. Many thanks in advance, Christoph -----Original Message----- From: John Hardin [mailto:jhar...@impsec.org] Sent: 10 June 2013 14:24 To: users@spamassassin.apache.org Subject: Re: Single images with random wording & general rules On Mon, 10 Jun 2013, emailitis.com wrote: > I tried to send the source from one such email but it was rejected > with a Spam score of 13: > > Remote host said: 552 spam score (13.6) exceeded threshold > > HTML_MESSAGE,SPF_HELO_PASS,SPF_PASS,URIBL_BLACK,URIBL_DBL_SPAM,URIBL_J > P_SURB > L,URIBL_RHS_DOB,URIBL_WS_SURBL > > > > On our server, it passed with: > > Jun 8 15:12:40 plesk3 spamd[2692]: spamd: result: . 1 - > BAYES_00,HTML_EXTRA_CLOSE,HTML_IMAGE_RATIO_06,HTML_MESSAGE,REMOVE_BEFO > RE_LIN > K,T_REMOTE_IMAGE,URIBL_BLACK BAYES_00 is probably your largest problem. Do you have autolearn turned on? If you are manually training, have you retained your training corpora so that they can be reviewed for misclassfications? Do you allow your users to train without review? Depending on the answers to the above, you are probably looking at wiping and retraining your Bayes database from scratch. It is possible that training these messages as spam will correct things, but for best results you'll need to unlearn the messages that led them to be scored as ham initially, and determine why they were learned as ham in the first place so you can prevent that happening in the future. -- John Hardin KA7OHZ <http://www.impsec.org/~jhardin/> http://www.impsec.org/~jhardin/ <mailto:jhar...@impsec.org> jhar...@impsec.org FALaholic #11174 pgpk -a <mailto:jhar...@impsec.org> jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 ----------------------------------------------------------------------- The Tea Party wants to remove the Crony from Crony Capitalism. OWS wants to remove Capitalism from Crony Capitalism. -- Astaghfirullah ----------------------------------------------------------------------- 375 days since the first successful private support mission to ISS (SpaceX)