Another possibility would be to generate meta rules from random sets of three rules. Some (actually random) examples:
meta RANDOM_3_A = (MPART_ALT_DIFF && GAPPY_SUBJECT && URI_UNSUBSCRIBE) meta RANDOM_3_B = (RCVD_IN_MAPS_OPS && WEIRD_PORT && FSL_FAKE_GMAIL_RCVD) meta RANDOM_3_C = (FB_CAN_LONGER && FU_HOODIA && RCVD_IN_NJABL_PROXY) And, one rule at a time, re-run score generation to see if it comes up with a higher accuracy result. If it does, you'd need to re-run score generation a few more times with and without the additional rule to verify it's not just a fluke of the random selection of train vs. test corpora. You could increase your chances by focusing on rules that show up in emails that were incorrectly categorized (false positives / negatives). Info on running score generation: http://wiki.apache.org/spamassassin/RescoreMassCheck Sounds more reasonable than my last post, right? I don't remember how long it takes, but say you can run it 10 times per day, and with 913 rules there are 126424936 possible combinations of 3 rules, so that would take you about 34,614 years. But that's without focusing on the combinations that show up in incorrectly classified emails. Maybe we could get distributed.net in on it. -- "It is the first responsibility of every citizen to question authority." - Benjamin Franklin http://www.ChaosReigns.com