On Thu, Jul 29, 2010 at 01:09:55AM +0100, RW wrote:
>
> I wrote a patch last week (which I've attached) to add country pairs as
> separate token metadata  e.g.
> 
> X-Spam-Relay-Countries: US US CA NG
> X-Spam-Relay-Country-Tokens: Trusted_US USCA CANG
> 
> It's not a straight fix, but I'll submit it if no-one has a better idea.

Idea sounds fine, I like that you try to keep the order and networks. Maybe
also the originating (first) client should have some prefix.

Also not tokenized is X-Languages..

Yet another good Bayes candidate would be attachment file names.. I made a
quick hack for it some time ago: http://sa.hege.li/ExtraTokens.pm

Jul 29 09:14:39.423 [17768] dbg: bayes: header tokens for X-Filenames = " 
application/octet-stream Google Corporation Lottery unit E*pdf "
Jul 29 09:14:39.454 [17768] dbg: bayes: token 'HX-Filenames:Corporation' => 
0.986543689320388
Jul 29 09:14:39.457 [17768] dbg: bayes: token 'HX-Filenames:E*pdf' => 
0.130547740353781

Note that all of this is just speculation. One needs to run some stats to
find whether they help at all or just waste db:

http://wiki.apache.org/spamassassin/TenFoldCrossValidation

I never got around to it, probably Justin only knows the magic voodoo
commands.. if someone could find out the proper way for 3.3 and update
wiki?? I'd really like to test some runs, but I don't have time to
experiment to get it working.

Reply via email to