Re: [SAtalk] which is 'better' for reporting/learning? sa-learn+razor vs. spamassassin -r

jvanasco Thu, 17 Jul 2003 15:11:29 -0700

On Thursday, July 17, 2003, at 04:26 PM, Nix wrote:

On Wed, 16 Jul 2003, [EMAIL PROTECTED] muttered drunkenly:
but by doing sa-learn, one has the option to skip rebuilding the bayes
database after the scan -- which has significantly reduced my
execution time and resources used
spamassassin -r doesn't seem to do that
spamassassin -r always does that. :)

really? if so it is undocumented -- and a little odd. granted, the assumption that it is not is undocumented too. but:

POD spamassassin ----- -r, --report Report this message as verified spam. This will submit the mail message read from STDIN to various spam-blocker databases. Currently, these are Vipul's Razor ( http://razor.sourceforge.net/ ), the Distributed Checksum Clearinghouse ( http://www.rhyolite.com/anti-spam/dcc/ ), and Pyzor.

If the message contains SpamAssassin markup, this will be stripped out automatically before submission. The support modules for DCC, Razor and/or Pyzor must be installed for spam to be reported to each service.

The message will also be submitted to SpamAssassin's learning systems; currently this is the internal Bayesian statistical-filtering system (the BAYES rules). (Note that if you only want to perform statistical learning, and do not want to report mail to a third-party server, you should use the sa-learn command directly instead.) -----

nothing in there about not rebuilding Bayes.. and

POD sa-learn ----- --rebuild Rebuild the databases, typically done after learning with --no-rebuild, or if you wish to periodically clean the Bayes databases once a day on a busy server. --no-rebuild Skip the slow rebuilding step which normally takes place after changing database entries. If you plan to scan many folders in a batch, it is faster to use this switch and run sa-learn --rebuild once all the folders have been scanned. -----

which makes no allusion to rebuilding the bayes db after spamassassin -r

Force-rebuilding, there might be some call for, I suppose. It'd mean you'd do a rebuild after every message learned, but that's not too terrible.

no.. it would operate, i'd imagine, just like "# sa-learn --rebuild "

I don't like that much; sa-learn's job isn't really reporting, it's *learning*. Adding reportage to that job looks like feature creep to me.

well, then maybe an sa-report mechanism -- that trains bayes and reports spam to whatever you specify. i wouldn't think of it as new 'features' considering dcc/razor/pyzor checks are part of spamassassin

because as it stands, i'm either grossly incorrect or the documentation is misleading/confusing/or both.

spamassassin -r would logically have the resource consuming bayes rebuild after each report -- given the explicit documentation in the sa-learn facility. i dont know what the code is doing, haven't been able to trace it that far -- what i do know, is that the documentation for sa-learn says it rebuilds after each call unless explicity turned off, and the documentation for spamassassin -r says nothing about this.

there should be some sort of standard call that would take a Maildir/mbox of spam and: 1) strip sa headers of all messages in the box 2) submit messages to bayes 3) submit messages to razor / pyzor / dcc (possibly even converting maildir to mbox on this , so a single login could pass all the data and minimize bandwidth) 4) rebuild the bayes it would be an efficient way to get everything done, and it seems something pretty central to operating spamassassin that it shouldn't need writing external scripts for

granted, i'm gonna have some perl/python script on a cron job running all the learning stuff anyways, but its a function less i'd have to write, and i'm a lazy bastard.

-------------------------------------------------------
This SF.net email is sponsored by: VM Ware
With VMware you can run multiple operating systems on a single machine.
WITHOUT REBOOTING! Mix Linux / Windows / Novell virtual machines at the
same time. Free trial click here: http://www.vmware.com/wl/offer/345/0
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Re: [SAtalk] which is 'better' for reporting/learning? sa-learn+razor vs. spamassassin -r

Reply via email to