-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Chris Petersen writes:
>> See 'man sa-learn' or use
>> http://www.spamassassin.org/doc/sa-learn.html
>> http://wiki.spamassassin.org/w/BayesInSpamAssassin
>
>This doesn't say much about HOW it's used in SA, though.  For instance,
>does SA bayes score URI tokens higher than it does general body tokens? 
>(if not, it should)  What about message headers?  Does it tokenize
>rawbody or body?  Does it tokenize only word-based characters, or would
>something like "[EMAIL PROTECTED]@" become a token?
>
>I'd honestly like some answers to these questions - I've asked before
>but didn't see any responses.

Chris --

It tokenizes body, "[EMAIL PROTECTED]@" would be a token (it's more or less
split-on-whitespace), and all tokens are treated equally (although
tracked in separate namespaces for header, URI, mail address, and body
tokens).

If you think some tokens should be "stronger" than others, please do a
10-fold cross-validation testing run which should *prove* that to be the
case.  We don't adopt Bayes tokenizer or combiner changes without
such testing.

Also -- if you were so keen for answers, I think your best option would
have been to Use The Source! ;)  We don't always have time to answer,
and the definitive answer is right there.

- --j.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (GNU/Linux)
Comment: Exmh CVS

iD8DBQFABZ0dQTcbUG5Y7woRAnnEAJwOvmFBtofaRmF7luvd8ZOvR4a0CACfdGMW
7Tq8pyGzJ+dL+FsaccKgt4o=
=dZQb
-----END PGP SIGNATURE-----



-------------------------------------------------------
This SF.net email is sponsored by: Perforce Software.
Perforce is the Fast Software Configuration Management System offering
advanced branching capabilities and atomic changes on 50+ platforms.
Free Eval! http://www.perforce.com/perforce/loadprog.html
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to