On 09/28, dar...@chaosreigns.com wrote: > On 09/28, Marc Perkel wrote: > > You would only have to test the rule combinations that the message > > actually triggered. So if it hit 10 rules then it would be 1024 > > combinations. Seems not to be unreasonable to me.
> combinations in the actual corpora would be much higher. I'll try to > get you a number. 360,468. Combinations of rules seen in the actual mass-check corpora, from the latest -net run (2011-09-24), after stripping out T_* and __* rules, but not stripping out "tflags nopublish" rules. So that would only take about 394 times as much data submitted via mass-check as we currently have, to maintain a similar level of accuracy :) Seems likely I could find something useful in this direction though. Looking for combinations of 2 or 3 rules that show up relatively often in mis-categorized emails. -- "Am I a man who dreamed I was a butterfly, or am I a butterfly who is dreaming I am a man?" - Chuang Tsu, ~350 BC http://www.ChaosReigns.com