Is it possible to configure spamassassin tokenizer to ignore non-word characters (ignoring all non-words may not be a good idea)? My bayes_toks has grown quite a lot, and in the future it might become too large due to garbage entries like H*F:U*eg1ezjrqb (then again, I understand that even entries like that may be good in spotting virii and other unwanted e-mail)
My bayes_toks is now 5 MB large, and the mail server running spamassassin is a 90 MHz Pentium with 64 MB RAM (very heavily loaded already). Results of running 'sa-learn --dump' are available from http://majakari.net/sorted_tokens.bz2, if someone is willing to look at it (warning: it is 613 KB large) -- # Edvard Majakari Software Engineer # PGP PUBLIC KEY available Soli Deo Gloria! $_ = '456476617264204d616a616b6172692c20612043687269737469616e20'; print join('',map{chr hex}(split/(\w{2})/)),uc substr(crypt(60281449,'es'),2,4),"\n"; ------------------------------------------------------- This SF.net email is sponsored by: IBM Linux Tutorials. Become an expert in LINUX or just sharpen your skills. Sign up for IBM's Free Linux Tutorials. Learn everything from the bash shell to sys admin. Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk