On Mon, Aug 25, 2003 at 03:36:50PM +1200, Simon Byrnand wrote: > Ok. Did the statistics file give any suggestion of what kind of balance > between spam and ham would get autolearnt with those thresholds ? Is the
Have you looked at the STATISTICS* files? > new Bayes algorithm any more resistant to being skewed by learning a lot > more ham than spam ? (Which is what tended to happen with 0.1 and 12 under > 2.55 anyway, I ended up changing 0.1 to -1 because the ham learnt was > outweighing spam by nearly 5 to 1) All Bayes systems will get skewed if you bias the learning one way or the other, 2.60 hasn't changed that. As for the autolearn values, people may have to change it (and other default values) as necessary. We try to make the defaults generically good for everyone, but everyone's situation is different. :) > Ok, I can understand that.... guess I'll have to rework my system a bit to > work around it... for now I'll drop the threshold to 50... assuming it wont > be reduced to less than that later on ? :) If I could tell the future, I'd be winning lotteries left and right. ;) I would assume it won't be reduced below 50. The idea was that 50 should be high enough for anyone to know "ok, this is spam, really." I can't see why we'd lower it right now. -- Randomly Generated Tagline: "The one computer-language course I took was Cobol, and basically, I just slept the whole quarter. Then, the night before the final, I read the IBM Cobol manual, and I got the top score in the final." - Larry Wall
pgp00000.pgp
Description: PGP signature