On Wed, May 04, 2005 at 01:03:18PM -0400, Matt Kettler wrote: > >In -well- every mail. That is not too weird, since > >this is my domain! Why does rate 'alinoe.com' and 'com' > >and 'carlo' as spammy tokens? Is that normal? > > > No, it's not normal. > > Have you been training your bayes using forwarded messages? > > In general it looks like your bayes has been very heavily trained on > spam that was addressed To: you, and almost no nonspam messages > addressed To: you. This is something that could happen if you were > forwarding mail for training, or if you used someone elses nonspam for > training (and little or none of your own), but did use your own spam.
Yeah... the point is, I receive mail on my firewall machine. There are no accounts there, but I want to run spamassassin there so that it's cpu cycles don't bother me on my working machine. However, I don't want the bayesian database to autolearn: I want it to only learn correctly. So, I have auto-learn off. The tagged mail is then sent to another machine that sorts it into mailboxes with procmail. All mail is THERE decided to be REALLY ham or spam (under my guidance) and is then forwarded back to the firewall machine (two special accounts there) which is then fed to the bayes. I didn't realize that this didn't work. How can I solve this? My .procmail (using a lot custom rules decides whether or not something is spam or ham and sends it to [EMAIL PROTECTED] or [EMAIL PROTECTED], on the firewall this is just stored in mailboxes and further nothing. Manually I can react to what was tagged as 'ham' by saying that is really is spam, or visa versa, or I can tell it to 'forget' it (I sent it to [EMAIL PROTECTED]). These mailboxes (ham, spam, really_ham, really_spam, forget) are then processed once per day from a cron job. I suppose I should first filter the headers before processing them? Here is an example of the header of a spam as it finally ends up on the firewall in the 'spambox' mailbox: >From [EMAIL PROTECTED] Wed May 04 21:58:09 2005 Return-Path: <[EMAIL PROTECTED]> Delivered-To: [EMAIL PROTECTED] Received: (qmail 16434 invoked by alias); 4 May 2005 21:58:09 -0000 Delivered-To: [EMAIL PROTECTED] Received: (qmail 16430 invoked from network); 4 May 2005 21:58:09 -0000 Received: from ansset.ansset-jolan (HELO mail.alinoe.com) (192.168.2.2) by alinoe.com with SMTP; 4 May 2005 21:58:09 -0000 Received: (qmail 17609 invoked by uid 500); 4 May 2005 21:58:05 -0000 Resent-Date: 4 May 2005 21:58:05 -0000 Resent-Message-ID: <[EMAIL PROTECTED]> Resent-From: [EMAIL PROTECTED] Delivered-To: [EMAIL PROTECTED] Received: (qmail 17589 invoked from network); 4 May 2005 21:58:05 -0000 Received: from jolan.jolan-alinoe (HELO alinoe.com) (192.168.2.1) by mail.alinoe.com with SMTP; 4 May 2005 21:58:05 -0000 Received: (qmail 16424 invoked by uid 109); 4 May 2005 21:58:08 -0000 Received: from 24.22.13.76 by alinoe.com (envelope-from <[EMAIL PROTECTED]>, uid 102) with qmail-scanner-1.25 (spamassassin: 3.0.3. Clear:RC:0(24.22.13.76):SA:1(33.3/4.0):. Processed in 4.019034 secs); 04 May 2005 21:58:08 -0000 X-Envelope-From: [EMAIL PROTECTED] Received: from c-24-22-13-76.hsd1.or.comcast.net (24.22.13.76) by alinoe.com with SMTP; 4 May 2005 21:58:03 -0000 Received: from yyhmail.com (yyhmail-com-bk.mr.outblaze.com [205.158.62.177]) by c-24-22-13-76.hsd1.or.comcast.net (Postfix) with ESMTP id 77P4H5V2OR for <[EMAIL PROTECTED]>; Wen, 4 May 2005 23:59:03 +0000 From: Aisha Rice <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Subject: [SA-score:33.3] She humped, and ground her body... Date: Wen, 4 May 2005 23:59:03 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="Windows-1251" Content-Transfer-Encoding: 7bit X-Qmail-Scanner-Message-ID: <[EMAIL PROTECTED]> X-Spam-Prev-Subject: She humped, and ground her body... X-Spam-Flag: YES X-Spam-Checker-Version: SpamAssassin 3.0.3 (2005-04-27) on alinoe.com X-Spam-Scores: BAYES_99=3.5,DNS_FROM_RFC_POST=1.614,FAKE_OUTBLAZE_RCVD=3.1, HELO_DYNAMIC_IPADDR=4.4,INVALID_DATE=0.236,RATWARE_RCVD_PF=3.867, RCVD_IN_DSBL=3.805,RCVD_IN_NJABL_PROXY=0.438,RCVD_IN_SORBS_MISC=0.338, RCVD_IN_XBL=3.076,URIBL_OB_SURBL=3.213,URIBL_SC_SURBL=4.263, URIBL_WS_SURBL=1.462 X-Spam-Level: ********************************* X-Spam-Status: Yes, score=33.3 required=4.0 tests=BAYES_99,DNS_FROM_RFC_POST, FAKE_OUTBLAZE_RCVD,HELO_DYNAMIC_IPADDR,INVALID_DATE,RATWARE_RCVD_PF, RCVD_IN_DSBL,RCVD_IN_NJABL_PROXY,RCVD_IN_SORBS_MISC,RCVD_IN_XBL, URIBL_OB_SURBL,URIBL_SC_SURBL,URIBL_WS_SURBL autolearn=disabled version=3.0.3 X-Spam-Report: Bayes score: 1.0000 (Tokens: new, 21; hammy, 3; neutral, 9; spammy, 51.) Hammy tokens: 0.096-+--H*r:Postfix, 0.146-+--HContent-Transfer-Encoding:7bit, 0.152-+--H*c:plain Spammy tokens: 1.000-+--H*c:Windows-1251, 0.999-+--H*r:ip*205.158.62.177, 0.999-8--H*RU:205.158.62.177, 0.999-6--pussy, 0.999-5--babe, 0.998-5--fucked, 0.998-4--H*r:Wen, 0.995-2--chick Test scores: BAYES_99=3.5,DNS_FROM_RFC_POST=1.614,FAKE_OUTBLAZE_RCVD=3.1,HELO_DYNAMIC_IPADDR=4.4,INVALID_DATE=0.236,RATWARE_RCVD_PF=3.867,RCVD_IN_DSBL=3.805,RCVD_IN_NJABL_PROXY=0.438,RCVD_IN_SORBS_MISC=0.338,RCVD_IN_XBL=3.076,URIBL_OB_SURBL=3.213,URIBL_SC_SURBL=4.263,URIBL_WS_SURBL=1.462 Resent-To: [EMAIL PROTECTED] Can you confirm that these headers would cause the Bayes to learn MY domain (alinoe.com) as spammy as shown in the previous post? And if so, which headers should I remove before feeding it to the Bayes classifier? -- Carlo Wood <[EMAIL PROTECTED]>