On Thu, Jul 29, 2010 at 01:09:55AM +0100, RW wrote: > > I wrote a patch last week (which I've attached) to add country pairs as > separate token metadata e.g. > > X-Spam-Relay-Countries: US US CA NG > X-Spam-Relay-Country-Tokens: Trusted_US USCA CANG > > It's not a straight fix, but I'll submit it if no-one has a better idea.
Idea sounds fine, I like that you try to keep the order and networks. Maybe also the originating (first) client should have some prefix. Also not tokenized is X-Languages.. Yet another good Bayes candidate would be attachment file names.. I made a quick hack for it some time ago: http://sa.hege.li/ExtraTokens.pm Jul 29 09:14:39.423 [17768] dbg: bayes: header tokens for X-Filenames = " application/octet-stream Google Corporation Lottery unit E*pdf " Jul 29 09:14:39.454 [17768] dbg: bayes: token 'HX-Filenames:Corporation' => 0.986543689320388 Jul 29 09:14:39.457 [17768] dbg: bayes: token 'HX-Filenames:E*pdf' => 0.130547740353781 Note that all of this is just speculation. One needs to run some stats to find whether they help at all or just waste db: http://wiki.apache.org/spamassassin/TenFoldCrossValidation I never got around to it, probably Justin only knows the magic voodoo commands.. if someone could find out the proper way for 3.3 and update wiki?? I'd really like to test some runs, but I don't have time to experiment to get it working.