[SAtalk] Ignoring non-word characters in filtering?

Edvard Majakari Fri, 19 Dec 2003 18:24:03 -0800

Is it possible to configure spamassassin tokenizer to ignore non-word
characters (ignoring all non-words may not be a good idea)? My bayes_toks
has grown quite a lot, and in the future it might become too large due to
garbage entries like H*F:U*eg1ezjrqb (then again, I understand that even
entries like that may be good in spotting virii and other unwanted e-mail)


My bayes_toks is now 5 MB large, and the mail server running spamassassin
is a 90 MHz Pentium with 64 MB RAM (very heavily loaded already).

Results of running 'sa-learn --dump' are available from
http://majakari.net/sorted_tokens.bz2, if someone is willing to look at it
(warning: it is 613 KB large)

-- 
# Edvard Majakari               Software Engineer
# PGP PUBLIC KEY available      Soli Deo Gloria!

$_ = '456476617264204d616a616b6172692c20612043687269737469616e20'; print
join('',map{chr hex}(split/(\w{2})/)),uc substr(crypt(60281449,'es'),2,4),"\n";



-------------------------------------------------------
This SF.net email is sponsored by: IBM Linux Tutorials.
Become an expert in LINUX or just sharpen your skills.  Sign up for IBM's
Free Linux Tutorials.  Learn everything from the bash shell to sys admin.
Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

[SAtalk] Ignoring non-word characters in filtering?

Reply via email to