On Mon, Aug 25, 2003 at 02:59:42PM +1200, Simon Byrnand wrote:
> already using on 2.55 I'm curious to know if the new values were chosen
> empirically or whether some kind of stats were involved to check the lowest
> scores of spam and the highest scores of ham etc...
They were chosen by looking at the STATISTICS* files.
Ok. Did the statistics file give any suggestion of what kind of balance between spam and ham would get autolearnt with those thresholds ? Is the new Bayes algorithm any more resistant to being skewed by learning a lot more ham than spam ? (Which is what tended to happen with 0.1 and 12 under 2.55 anyway, I ended up changing 0.1 to -1 because the ham learnt was outweighing spam by nearly 5 to 1)
Since there's no way to wrap that header (think when GTUBE hits), we needed to limit the length of the line. Any choice would potentially cut the line short, so we chose 50 which means that it's unlikely the full header will go over 80 chars (remember, you can rename X-Spam-Level to anything you want, so we're not guaranteed on header length.)
Ok, I can understand that.... guess I'll have to rework my system a bit to work around it... for now I'll drop the threshold to 50... assuming it wont be reduced to less than that later on ? :)
Regards, Simon
------------------------------------------------------- This SF.net email is sponsored by: VM Ware With VMware you can run multiple operating systems on a single machine. WITHOUT REBOOTING! Mix Linux / Windows / Novell virtual machines at the same time. Free trial click here:http://www.vmware.com/wl/offer/358/0 _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk