On Thu, 5 Mar 2009, decoder wrote:

John Hardin wrote:
 Would there be any benefit to having an offline version - i.e.
 something that evaluates the log or a corpus to generate new meta
 rules, that could be added onto the default ruleset? For instance:

 cron @ 0200:
 sa_meta_eval > /etc/mail/spamassassin/metarules.cf
 /etc/init.d/spamassassin restart

This is definetly a good idea. You can create the SVM model offline from a logfile only, if it includes the rules that scored and the ham/spam status.

From my /var/log/maillog:

Mar 1 04:22:22 ga spamd[30536]: spamd: result: Y 46 - BAYES_99,BAYES_POISON_02,DNS_FROM_RFC_ABUSE,DNS_FROM_RF
C_POST,FORGED_MUA_OUTLOOK,FORGED_OUTLOOK_HTML,FORGED_OUTLOOK_TAGS,FORGED_RCVD_HELO,FREEMAIL_FROM,FROM_ILLEGAL_
CHARS,HTML_40_50,HTML_FONT_INVISIBLE,HTML_MESSAGE,L_SOME_STD_PROBS,MIME_BOUND_DD_DIGITS,MIME_HTML_ONLY,MIME_HT
ML_ONLY_MULTI,MISSING_MIMEOLE,RBL_PSBL_01,RCVD_BY_IP,RCVD_DOUBLE_IP_SPAM,RCVD_HELO_IP_MISMATCH,RCVD_NUMERIC_HE
LO,SARE_RECV_IP_FROMIP1,SPF_SOFTFAIL,SUBJ_ILLEGAL_CHARS,UNPARSEABLE_RELAY,UPPERCASE_50_75 scantime=9.4,size=3150,user=root,uid=99,required_score=5.0,rhost=localhost,raddr=127.0.0.1,rport=40282,mid=<KGNPKZIWNMHBPXAQXUKDUC
k.uqkkajgreg_ji...@msn.com>,bayes=1,autolearn=disabled

Unfortunately, only using the log won't let you address FPs and FNs, so in addition to the log you'd need to be able to scan corpa. I'd suggest that you do both, and have it prefer the per-message spam/ham status from the corpa over the spam/ham status from the log (matching by MSGID of course).

However, you cannot generate metarules with SVMs, for that purpose you need a different learning algorithm (for example bayes, or decision trees).

However, SVM classification is very cheap, so once you created the model offline, you can use it online really quickly with a plugin.

Then perhaps we're looking at two different but related tools, a plugin for SVMs and an offline static meta rule generator. They may be complimentary, or they may be different ways to achieve similar results.

Personally I know I'd be more comfortable (at least at this point) running an offline metarule generator as part of my nightly bayes training script than I would be in adding another plugin, which is why I brought it up.

Add to that, the offline meta rule generator would be useful in older SA installs that might not support a plugin written to the current API...

--
 John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
 jhar...@impsec.org    FALaholic #11174     pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  Failure to plan ahead on someone else's part does not constitute
  an emergency on my part.                 -- David W. Barts in a.s.r
-----------------------------------------------------------------------
 3 days until Daylight Saving Time begins in U.S. - Spring Forward

Reply via email to