On 10/15/2013 4:15 PM, David B Funk wrote:
> On Mon, 14 Oct 2013, Stan Hoeppner wrote:
> 
>> On 10/14/2013 2:47 PM, Adam Katz wrote:
>>> On 10/12/2013 09:26 AM, Stan Hoeppner wrote:
>>>> These two rules are adding 4.0 pts [...]
>>>> Content analysis details:   (4.8 points, 4.2 required)
>>>>  pts rule name              description
>>>> ----
>>>> ---------------------------------------------------------------------
>>>>  2.8 FSL_HELO_BARE_IP_2     FSL_HELO_BARE_IP_2
>>>>  1.2 RCVD_NUMERIC_HELO      Received: contains an IP address used
>>>> for HELO
>>>>  0.8 BAYES_50               BODY: Bayes spam probability is 40 to 60%
>>>>                             [score: 0.5314]
>>>
>>> The others have addressed the "two rules" you mentioned, so I'll leave
>>> that alone in this email.
>>>
>>> There's more here than that:  If you're using Bayes, you have to train
>>> it.  Right now, it's hurting you:  Those 0.8 points should be some
>>> negative value, perhaps -1.9 or -0.5 (the default scores for BAYES_00
>>> and BAYES_05), which would then have made that message score 2.1 or 3.5,
>>> both of which are below your 4.2 threshold (which is already too low!).
>>
>> There's no doubt my Bayes isn't working.  I ran a few hundred each of
>> ham and spam through sa-learn just after installing SA some year+ ago.
>> I haven't regularly fed it since, though I have run through maybe a few
>> dozen spam that weren't scored high enough.  And I think I may have
>> inadvertently run through one or two msgs that had anti-Bayesian text
>> blocks in them-- the bible versus, wikipedia content, etc.
>>
>> I just ran 120 hams through, about half were msgs tagged previously with
>> Bayes_60 through Bayes_95.
>>
>> ~$ sa-learn --ham --mbox --progress /home/stan/mail/ham
>> Learned tokens from 0 message(s) (0 message(s) examined)
>>
>> Obviously there's a problem with no tokens learned.  A few questions:
>>
>> 1.  Is the database the problem?  If so...

Thanks for the reply David.

> When it says "(0 message(s) examined)" that shows that it was unable
> to parse -any- messages out of that input file. This tends to imply that
> the contents of that "/home/stan/mail/ham" file are not a "mbox" format
> or it's an empty mailbox.

No, actually the file doesn't exist.  We have a case sensitivity error.
 Why doesn't sa-learn return the filesystem error?  I'd have instantly
caught my typo if it did.

$ more /home/stan/mail/ham
/home/stan/mail/ham: No such file or directory

> First thing to fix, get your input recoginised as messages. Then see how
> they're being learned.

Note the caps "H" in Ham.

~$ sa-learn --ham --mbox --progress /home/stan/mail/Ham
...
Learned tokens from 113 message(s) (116 message(s) examined)

113/116 is promising.  I'll keep feeding more ham through and we'll see
if these FPs start to fall.  Thanks again David for helping me catch a
simple typo.  Maybe we could someday get sa-learn to properly return
error msgs?

-- 
Stan

Reply via email to