Crocomoth wrote: > Matt Kettler-3 wrote: > >>> 1. Using this method, admin must understand that the fate of every >>> message >>> (for all users) will depend from the single rule. >>> >> Not if you set it up properly.. You can have multiple rules run with a >> very early priority (low number), then have another one run with a >> semi-early priority which does shortcircuiting. All of the "very early" >> rules will be involved in the decision to shortcircuit or not. >> >> > > Yes, but low-numbered rules may not generate any points and the desision may > depend from one rule anyways. This does not change anything. And what is > more (see (2) with which you have agreed), in default configuration, this > will be bayes which generates only 3.5 points (not taking into account > while/black lists because they will not be set up properly in most cases). > And, I think, number of persons not wishing to reorder standard rules will > be much more than "semi-professional" admins. >
True, but your automated method based on sorting them on "weight" would pretty much grind spamassassin to a screeching halt by increasing the average scan time due to forcing multiple passes through the message. Not to mention false positive problems if negative-scoring rules end up being considered "heavy" and don't get run. Your idea essentially ruins any benefits of memory caching that SpamAssassin currently exploits. Right now, rules are run in groups based on what part of the message they need. This lends speed to spamassassin by allowing that portion of the mesage to already be in cache for all but the first rule in the group. If you start jumping around all over the message for different rules, the processor memory cache quickly becomes full and pushes out parts that you're going to be looking at again. If you keep going back-and-forth header, body, header, body, header, body.. you wind up going out to ram quite often, and that's painfully slow. (I don't care what high-speed dual-channel ddr2 memory setup you have, it's abysmally slow from the processors perspective, generally 20 times slower than cache is) Sure, some messages will bail out faster, but most messages will take much longer to scan. How is that better? I don't debate that the basic idea of having SA do this "automagically" would be a great thing. However, the reality of doing it efficiently is much trickier than you think. At one point, one idea was to run all the negative scoring rules, and then run the positive scoring ones, and bail out if the score went over the spam threshold during the positive phase. The end result of that test was abysmally slow, due to having to scan the message in two passes (negative header, negative body, positive header, positive body). > Sort order may be: negative rules, sorted positive common rules. Any > user-defined rules should be checked after negative ones and before > positives, if exists. Of course, sorting should be performed once upon load > procedure. Tested, as mentioned above. Resulted in horrible performance due to over-sorting. > Or, such a cut-off may work without any sorting; this is optional. Standard > priorities could be enough, if they set up. I'd agree there. SA could exploit priorities better in the default config, but this kind of thing needs to be done very carefuly to avoid thrashing the processor cache. Any simple "sort by.." is going to result in terrible performance.