On 6/26/07 4:28 PM, Matt Kettler wrote: > Rick van der Zwet wrote: >>> L, you could have a script find the relevant sha1 >>> hashes and remove them. >>> >>> However, why do you want to do this in the first place? >>> >>> SA's chi-squared combining is pretty good at ignoring words that appear >>> in both spam and nonspam... >>> >> Cause I know for example some really specific words which are added all >> the time like footers/disclaimers/mailinglist prefixes. And I don't want >> this words to affect the bayes score. >> > They really shouldn't matter. >> If you take for example a small spam message the ratio bad/good words >> will be about 50 or more. >> > So? the combining is chi-squared, which will favor the "stronger" tokens > (ie: those close to 0 or 1.0) over the "present in everything" ones (ie: > those close to 0.50). > That I did not know and explaines/solved it :-) /Rick
-- http://rickvanderzwet.nl