-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Rune Kristian Viken writes:
> On Wednesday 14 September 2005 18:34, Bret Miller wrote:
> >> We're in the need of checking parts of our outgoing email for
> >> spam (read: we've got unknown webmail users.. hugs lots of them,
> >> actually.. and some of them have this annoying habit of sending
> >> nigeria spam) 
> >> 
> >> [considering network tests useless, Bayes excellent, but feels the 
> >> default weighting may be useless] 
> >>
> >> How do we re-weight the rules, and does anyone have any good
> >> suggestions on which checks to use?  Also, checking for certain
> >> blacklisted URLs in the messages will probably help (Someone recommended
> >> SURBL for  this) .. but I think a re-weighting will still be in order.
> >
> > I'd be inclined to try the SARE fraud rules (see www.rulesemporium.com)
> > in addition to the SA internal and bayes tests. 
> 
> Excellent suggestion!  I think we'll try those.  
> 
> > If you find that doesn't give you a high enough score, pushing the
> > BAYES_99 score a little higher might be in order.
> 
> That was what I was thinking about.  Others have mentioned local.cf, which 
> of course is a good thing (and we've already looked at that, it's covered 
> quite well in the docs).  What I was thinking was using the 
> 'masses/corpus'-things to generate our own weightings, trying to tune 
> SpamAssassin for our particular use-case.  Not sure if they're meant for 
> that, though - and very unsure on how to do that. I've not been able to dig 
> that up through the docs. If it's a bad idea - please do not hesitate to 
> point it out. 
> 
> Also, David B Funk suggested using -L , indicating "No network tests".  As 
> mentioned, I'm cosidering using SURBL.  Is it possible to still use SURBL 
> with -L ?  The docs says this is "Use local tests only (no DNS)" and that 
> seems to be off the mark.

I think you *do* want to use SURBL, in which case -L would not be
recommended.

One possible thing to do is collect some data, namely:

  - a selection of "good" nonspam outgoing mail
  - a selection of "bad" outgoing spam attempts

If you can do this, you can then build a corpus of mails to test against
and manually tweak scores.  I don't think you need to go to the bother
of generating an entirely new score-set, it should be possible to do this
with just a little manual tweaking.

Bayes will definitely be helpful, too, and that corpus will provide
training data.

- --j.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Exmh CVS

iD8DBQFDKajqMJF5cimLx9ARAsS8AKCFU7W92G6S7yd0oLpAa1GCggl6LwCdFLnf
pS/Rt0JvWYKPO3ExKLrfWAE=
=w2kB
-----END PGP SIGNATURE-----

Reply via email to