-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Chris Petersen writes: >> See 'man sa-learn' or use >> http://www.spamassassin.org/doc/sa-learn.html >> http://wiki.spamassassin.org/w/BayesInSpamAssassin > >This doesn't say much about HOW it's used in SA, though. For instance, >does SA bayes score URI tokens higher than it does general body tokens? >(if not, it should) What about message headers? Does it tokenize >rawbody or body? Does it tokenize only word-based characters, or would >something like "[EMAIL PROTECTED]@" become a token? > >I'd honestly like some answers to these questions - I've asked before >but didn't see any responses. Chris -- It tokenizes body, "[EMAIL PROTECTED]@" would be a token (it's more or less split-on-whitespace), and all tokens are treated equally (although tracked in separate namespaces for header, URI, mail address, and body tokens). If you think some tokens should be "stronger" than others, please do a 10-fold cross-validation testing run which should *prove* that to be the case. We don't adopt Bayes tokenizer or combiner changes without such testing. Also -- if you were so keen for answers, I think your best option would have been to Use The Source! ;) We don't always have time to answer, and the definitive answer is right there. - --j. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.3 (GNU/Linux) Comment: Exmh CVS iD8DBQFABZ0dQTcbUG5Y7woRAnnEAJwOvmFBtofaRmF7luvd8ZOvR4a0CACfdGMW 7Tq8pyGzJ+dL+FsaccKgt4o= =dZQb -----END PGP SIGNATURE----- ------------------------------------------------------- This SF.net email is sponsored by: Perforce Software. Perforce is the Fast Software Configuration Management System offering advanced branching capabilities and atomic changes on 50+ platforms. Free Eval! http://www.perforce.com/perforce/loadprog.html _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk