On Fri, 26 Sep 2003, Simon Byrnand wrote: > >If I eliminate the SA -d call then that leaves me with only one other > >CPU-draining call: SA -r > > > ># Report to Pyzor > >:0 Wc > >| /usr/bin/pyzor report > > > ># Report to Razor > >:0 Wc > >| spamassassin -r > > > >Now one thing I never thought about till just now is if my Pyzor call is > >redundant. See I don't call Pyzor from SA at all in normal mail > >processing. I only call Razor. I assume that that SA -r only calls what > >I have configured SA to use normally, correct? I need to look into that. > > > >That call is very CPU intensive for some reason. You wouldn't think that > >simply reporting a message to Razor would be that intensive but it is.
I actually read the man page this morning and was unpleasantly surprised with what I found. -r, --report Report this message as verified spam. This will submit the mail message read from STDIN to various spam-blocker databases. Currently, these are Vipul's Razor ( http://razor.sourceforge.net/), the Distributed Checksum Clearinghouse (http://www.rhyolite.com/anti-spam/dcc/ ), and Pyzor. If the message contains SpamAssassin markup, this will be stripped out automatically before submission. The support modules for DCC, Razor and/or Pyzor must be installed for spam to be reported to each service. The message will also be submitted to SpamAssassin's learning systems; currently this is the internal Bayesian statistical-filtering system (the BAYES rules). (Note that if you only want to perform statistical learning, and do not want to report mail to a third-party server, you should use the "sa-learn" command directly instead.) I'm sure I'd read it before but apparently it never sank in. So essentially --report does more than report. It also stuffs the spam into the Bayes database. Since I use MIMEDefang and can't use the Bayes DB that accounts for a fair amount of wasted CPU time. It also says that it strips the SA markup. It appears to do this regardless of whether or not it's already been stripped. I suspect this CPU time adds up fast. I understand the need to strip this stuff out before reporting the spam but there should be some way to say that this has already been done. That would at least prevent the waste of CPU time on looking for markup to strip. It also says it will submit to Razor, Pyzor, and DCC if their modules are installed. It doesn't however say if it honors the config file options for these services: use_razor2 1 use_pyzor 1 use_dcc 1 I propose some changes to how --report works. I propose that some way of telling SA who you want the spam reported to be devised. For example --report=razor2,pyzor,dcc,bayes or --report-to=razor2,pyzor,dcc,bayes At the least it should honor the SA config file (if it doesn't already). This would fix two of the problems I noted above (stuffing spam in Bayes when reporting and not being able to define where reports go). It would also prevent Bayes from learning a spam twice, once for sa-learn and once for spamassassin --report. I also propose an additional option be added to tell SA to not strip the markup when reporting. For example --report --nostrip or --report --already-stripped This would prevent the waste of cycles on parsing an already-stripped message for markup and take care of the 3rd problem I noted. Does anyone have any comments on this? Is there a place I need to submit this to? Justin ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk