On Wed, May 04, 2005 at 01:03:18PM -0400, Matt Kettler wrote:
> >In -well- every mail.  That is not too weird, since
> >this is my domain!  Why does rate 'alinoe.com' and 'com'
> >and 'carlo' as spammy tokens?  Is that normal?
> >
> No, it's not normal.
> 
> Have you been training your bayes using forwarded messages? 
> 
> In general it looks like your bayes has been very heavily trained on
> spam that was addressed To: you, and almost no nonspam messages
> addressed To: you. This is something that could happen if you were
> forwarding mail for training, or if you used someone elses nonspam for
> training (and little or none of your own), but did use your own spam.

Yeah... the point is, I receive mail on my firewall machine.
There are no accounts there, but I want to run spamassassin
there so that it's cpu cycles don't bother me on my working
machine.  However, I don't want the bayesian database to autolearn:
I want it to only learn correctly.  So, I have auto-learn off.
The tagged mail is then sent to another machine that sorts it
into mailboxes with procmail.  All mail is THERE decided to be
REALLY ham or spam (under my guidance) and is then forwarded
back to the firewall machine (two special accounts there)
which is then fed to the bayes.  I didn't realize that this
didn't work.

How can I solve this?  My .procmail (using a lot custom rules
decides whether or not something is spam or ham and sends it
to [EMAIL PROTECTED] or [EMAIL PROTECTED], on the firewall this
is just stored in mailboxes and further nothing.
Manually I can react to what was tagged as 'ham' by saying
that is really is spam, or visa versa, or I can tell it
to 'forget' it (I sent it to [EMAIL PROTECTED]).

These mailboxes (ham, spam, really_ham, really_spam, forget)
are then processed once per day from a cron job.
I suppose I should first filter the headers before processing them?

Here is an example of the header of a spam as it finally ends
up on the firewall in the 'spambox' mailbox:

>From [EMAIL PROTECTED] Wed May 04 21:58:09 2005
Return-Path: <[EMAIL PROTECTED]>
Delivered-To: [EMAIL PROTECTED]
Received: (qmail 16434 invoked by alias); 4 May 2005 21:58:09 -0000
Delivered-To: [EMAIL PROTECTED]
Received: (qmail 16430 invoked from network); 4 May 2005 21:58:09 -0000
Received: from ansset.ansset-jolan (HELO mail.alinoe.com) (192.168.2.2)
  by alinoe.com with SMTP; 4 May 2005 21:58:09 -0000
Received: (qmail 17609 invoked by uid 500); 4 May 2005 21:58:05 -0000
Resent-Date: 4 May 2005 21:58:05 -0000
Resent-Message-ID: <[EMAIL PROTECTED]>
Resent-From: [EMAIL PROTECTED]
Delivered-To: [EMAIL PROTECTED]
Received: (qmail 17589 invoked from network); 4 May 2005 21:58:05 -0000
Received: from jolan.jolan-alinoe (HELO alinoe.com) (192.168.2.1)
  by mail.alinoe.com with SMTP; 4 May 2005 21:58:05 -0000
Received: (qmail 16424 invoked by uid 109); 4 May 2005 21:58:08 -0000
Received: from 24.22.13.76 by alinoe.com (envelope-from <[EMAIL PROTECTED]>, 
uid 102) with qmail-scanner-1.25
 (spamassassin: 3.0.3.
 Clear:RC:0(24.22.13.76):SA:1(33.3/4.0):.
 Processed in 4.019034 secs); 04 May 2005 21:58:08 -0000
X-Envelope-From: [EMAIL PROTECTED]
Received: from c-24-22-13-76.hsd1.or.comcast.net (24.22.13.76)
  by alinoe.com with SMTP; 4 May 2005 21:58:03 -0000
Received: from yyhmail.com (yyhmail-com-bk.mr.outblaze.com [205.158.62.177])
        by c-24-22-13-76.hsd1.or.comcast.net (Postfix) with ESMTP id 77P4H5V2OR
        for <[EMAIL PROTECTED]>; Wen, 4 May 2005 23:59:03 +0000
From: Aisha Rice <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Subject: [SA-score:33.3] She humped, and ground her body...
Date: Wen, 4 May 2005 23:59:03 +0000
MIME-Version: 1.0
Content-Type: text/plain;
        charset="Windows-1251"
Content-Transfer-Encoding: 7bit
X-Qmail-Scanner-Message-ID: <[EMAIL PROTECTED]>
X-Spam-Prev-Subject: She humped, and ground her body...
X-Spam-Flag: YES
X-Spam-Checker-Version: SpamAssassin 3.0.3 (2005-04-27) on alinoe.com
X-Spam-Scores: BAYES_99=3.5,DNS_FROM_RFC_POST=1.614,FAKE_OUTBLAZE_RCVD=3.1,
        HELO_DYNAMIC_IPADDR=4.4,INVALID_DATE=0.236,RATWARE_RCVD_PF=3.867,
        RCVD_IN_DSBL=3.805,RCVD_IN_NJABL_PROXY=0.438,RCVD_IN_SORBS_MISC=0.338,
        RCVD_IN_XBL=3.076,URIBL_OB_SURBL=3.213,URIBL_SC_SURBL=4.263,
        URIBL_WS_SURBL=1.462
X-Spam-Level: *********************************
X-Spam-Status: Yes, score=33.3 required=4.0 tests=BAYES_99,DNS_FROM_RFC_POST,
        FAKE_OUTBLAZE_RCVD,HELO_DYNAMIC_IPADDR,INVALID_DATE,RATWARE_RCVD_PF,
        RCVD_IN_DSBL,RCVD_IN_NJABL_PROXY,RCVD_IN_SORBS_MISC,RCVD_IN_XBL,
        URIBL_OB_SURBL,URIBL_SC_SURBL,URIBL_WS_SURBL autolearn=disabled
        version=3.0.3
X-Spam-Report:
        Bayes score: 1.0000 (Tokens: new, 21; hammy, 3; neutral, 9; spammy, 51.)
        Hammy tokens: 0.096-+--H*r:Postfix, 
0.146-+--HContent-Transfer-Encoding:7bit, 0.152-+--H*c:plain
        Spammy tokens: 1.000-+--H*c:Windows-1251, 
0.999-+--H*r:ip*205.158.62.177, 0.999-8--H*RU:205.158.62.177, 0.999-6--pussy, 
0.999-5--babe, 0.998-5--fucked, 0.998-4--H*r:Wen, 0.995-2--chick
        Test scores: 
BAYES_99=3.5,DNS_FROM_RFC_POST=1.614,FAKE_OUTBLAZE_RCVD=3.1,HELO_DYNAMIC_IPADDR=4.4,INVALID_DATE=0.236,RATWARE_RCVD_PF=3.867,RCVD_IN_DSBL=3.805,RCVD_IN_NJABL_PROXY=0.438,RCVD_IN_SORBS_MISC=0.338,RCVD_IN_XBL=3.076,URIBL_OB_SURBL=3.213,URIBL_SC_SURBL=4.263,URIBL_WS_SURBL=1.462
Resent-To: [EMAIL PROTECTED]


Can you confirm that these headers would cause the Bayes to learn
MY domain (alinoe.com) as spammy as shown in the previous post?
And if so, which headers should I remove before feeding it to 
the Bayes classifier?

-- 
Carlo Wood <[EMAIL PROTECTED]>

Reply via email to