Another possibility would be to generate meta rules from random sets of
three rules.  Some (actually random) examples:

meta RANDOM_3_A = (MPART_ALT_DIFF && GAPPY_SUBJECT && URI_UNSUBSCRIBE)
meta RANDOM_3_B = (RCVD_IN_MAPS_OPS && WEIRD_PORT && FSL_FAKE_GMAIL_RCVD)
meta RANDOM_3_C = (FB_CAN_LONGER && FU_HOODIA && RCVD_IN_NJABL_PROXY)

And, one rule at a time, re-run score generation to see if it comes up with
a higher accuracy result.  If it does, you'd need to re-run score
generation a few more times with and without the additional rule to verify
it's not just a fluke of the random selection of train vs. test corpora.

You could increase your chances by focusing on rules that show up in emails
that were incorrectly categorized (false positives / negatives).  

Info on running score generation:
http://wiki.apache.org/spamassassin/RescoreMassCheck


Sounds more reasonable than my last post, right?

I don't remember how long it takes, but say you can run it 10 times per
day, and with 913 rules there are 126424936 possible combinations of 3
rules, so that would take you about 34,614 years.

But that's without focusing on the combinations that show up in incorrectly
classified emails.  Maybe we could get distributed.net in on it.

-- 
"It is the first responsibility of every citizen to question authority."
- Benjamin Franklin
http://www.ChaosReigns.com

Reply via email to