On 08 Oct 2003 17:34:44 -0700, Daniel Quinlan <[EMAIL PROTECTED]> writes:

> Scott A Crosby <[EMAIL PROTECTED]> writes:
> 
> > Sure. The goal of that is to add in new tokens that are unique and
> > have never been seen before. Those can bias an email toward neutral.
> 
> Bayes could also just track never-seen-before tokens as an artificial
> token. 

The thing is that a gibberish token (not-with the statistics of $LANG,
not-dictionary) should, as a new token, be given a different bayes
catagory than one that is in a dictionary, etc.

> My initial testing indicates that new tokens (in the body) have
> a spam probability of about 0.83, at least for me.

Can you do testing to see if new non-english or new non-dictionary
tokens have a higher spam probability?

Scott


-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
SourceForge.net hosts over 70,000 Open Source Projects.
See the people who have HELPED US provide better services:
Click here: http://sourceforge.net/supporters.php
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to