On 6/26/07 4:28 PM, Matt Kettler wrote:
> Rick van der Zwet wrote:
>>> L, you could have a script find the relevant sha1
>>> hashes and remove them.
>>>
>>> However, why do you want to do this in the first place?
>>>
>>> SA's chi-squared combining is pretty good at ignoring words that appear
>>> in both spam and nonspam...
>>>     
>> Cause I know for example some really specific words which are added all
>> the time like footers/disclaimers/mailinglist prefixes. And I don't want
>> this words to affect the bayes score.
>>   
> They really shouldn't matter.
>> If you take for example a small spam message the ratio bad/good words
>> will be about 50 or more.
>>   
> So? the combining is chi-squared, which will favor the "stronger" tokens
> (ie: those close to 0 or 1.0) over the "present in everything" ones (ie:
> those close to 0.50).
> 
That I did not know and explaines/solved it :-)
/Rick

-- 
http://rickvanderzwet.nl

Reply via email to