On 10/15/2013 4:15 PM, David B Funk wrote: > On Mon, 14 Oct 2013, Stan Hoeppner wrote: > >> On 10/14/2013 2:47 PM, Adam Katz wrote: >>> On 10/12/2013 09:26 AM, Stan Hoeppner wrote: >>>> These two rules are adding 4.0 pts [...] >>>> Content analysis details: (4.8 points, 4.2 required) >>>> pts rule name description >>>> ---- >>>> --------------------------------------------------------------------- >>>> 2.8 FSL_HELO_BARE_IP_2 FSL_HELO_BARE_IP_2 >>>> 1.2 RCVD_NUMERIC_HELO Received: contains an IP address used >>>> for HELO >>>> 0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60% >>>> [score: 0.5314] >>> >>> The others have addressed the "two rules" you mentioned, so I'll leave >>> that alone in this email. >>> >>> There's more here than that: If you're using Bayes, you have to train >>> it. Right now, it's hurting you: Those 0.8 points should be some >>> negative value, perhaps -1.9 or -0.5 (the default scores for BAYES_00 >>> and BAYES_05), which would then have made that message score 2.1 or 3.5, >>> both of which are below your 4.2 threshold (which is already too low!). >> >> There's no doubt my Bayes isn't working. I ran a few hundred each of >> ham and spam through sa-learn just after installing SA some year+ ago. >> I haven't regularly fed it since, though I have run through maybe a few >> dozen spam that weren't scored high enough. And I think I may have >> inadvertently run through one or two msgs that had anti-Bayesian text >> blocks in them-- the bible versus, wikipedia content, etc. >> >> I just ran 120 hams through, about half were msgs tagged previously with >> Bayes_60 through Bayes_95. >> >> ~$ sa-learn --ham --mbox --progress /home/stan/mail/ham >> Learned tokens from 0 message(s) (0 message(s) examined) >> >> Obviously there's a problem with no tokens learned. A few questions: >> >> 1. Is the database the problem? If so...
Thanks for the reply David. > When it says "(0 message(s) examined)" that shows that it was unable > to parse -any- messages out of that input file. This tends to imply that > the contents of that "/home/stan/mail/ham" file are not a "mbox" format > or it's an empty mailbox. No, actually the file doesn't exist. We have a case sensitivity error. Why doesn't sa-learn return the filesystem error? I'd have instantly caught my typo if it did. $ more /home/stan/mail/ham /home/stan/mail/ham: No such file or directory > First thing to fix, get your input recoginised as messages. Then see how > they're being learned. Note the caps "H" in Ham. ~$ sa-learn --ham --mbox --progress /home/stan/mail/Ham ... Learned tokens from 113 message(s) (116 message(s) examined) 113/116 is promising. I'll keep feeding more ham through and we'll see if these FPs start to fall. Thanks again David for helping me catch a simple typo. Maybe we could someday get sa-learn to properly return error msgs? -- Stan