> Ps, The system only would need to "process all the rules regardless"
> during the loading of the child.

Well, yes and no.  This subject comes up a lot.  For the record, I favor an
early exit, as you do.  But also for the record, it really is more complex
than you make it out to be, and there are several gotchas involved in the
process.

The most interesting points are: exactly where do you process Bayes rules?,
and: this breaks AWL completely, since it is a score averager, and you are
suddenly working with partial scores that have no reliabilty whatever from
an awl point of view.  The end result id that if you enable shortcutting you
ALSO have to disable awl, or it will probably start doing Really Bad
Things(tm).

Aside from those two major points, there are a number of minor flow
sequencing problems that have to be solved, or at least definitely answered.

It is obvious that you have to process all negative-scoring rules first,
since that is the only way that you can be sure that you have taken
whitelists and the like into account, and your high positive score really IS
high enough to mark this as spam.

However, what if some of those negative-scoring rules are metas?  Now you
have to process the meta dependencies before the negative-scoring meta, even
if the dependencies have a positive score.  And what if one of the
dependencies is the AWL score?  AWL is supposed to run last to give accurate
results, but now it has to run before some arbitrary number of other rules.
But you shouldn't run AWL at all with shortcutting, and here you not only
have to run it, you have to run it early.  Do you drop that meta rule, even
though it can contribute a negative score?

Also, net rules are fired off first before any other rules are processed,
and then harvested after most all the rules have been processed.  Do you
still want to do that?  Or do you want to hold firing the net rules until
you see if it is going to be tagged as spam by other rules?  (Assuming you
don't have a negative meta that is dependent on a net rule!)  But if you do
this, and you assume the mail is NOT spam, it is going to process ALL
positifve scoring rules (which is essentially all rules) before it starts
the net tests.  Now you have completely lost net overlap, and will end up
sitting on your thumb longer before you can dispose of this message.

And what about user rules, or even user changes to rule scores?  This can
change the evaluation order from what you would normally do.

There is also a 'priority' field on the rules that determines the order to
run the tests.  This undemines the required order to be able to safely bail
early on short circuiting.

Unfortunately, it ain't trivial to get all, or even most all, of this
working in a way that the mathematically inclined would be willing to
consider the results provably correct.  And if the results aren't correct,
why bother making them in the first place?

        Loren

Reply via email to