-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello Gary,

Saturday, August 2, 2003, 6:55:01 PM, you wrote:

GF> Bob, Excellent. thanks. - Gary

My pleasure -- my way of contributing back to those who develop and
expand SA.

GF> I followed some of the links on the main page, ...

GF> A couple of questions/suggestions:

GF> Chris Santerre's list of rules at
GF> http://www.merchantsoverseas.com/wwwroot/gorilla/sa_rules.htm in turn
GF> points to some "body rules" that you (Bob Menschel) contributed,
GF> http://www.merchantsoverseas.com/wwwroot/gorilla/body.txt which has a
GF> lot of useful rules, but looking at the scores, it seems that you're
GF> using a higher threshold than the default value of 5? What value are
GF> you using, and why did you depart from the default value (of 5)?

Per http://www.spamassassin.org/doc/Mail_SpamAssassin_Conf.html
> required_hits n.nn (default: 5)
> Set the number of hits required before a mail is considered spam. n.nn
> can be an integer or a real number. 5.0 is the default setting, and is
> quite aggressive; it would be suitable for a single-user setup, but if
> you're an ISP installing SpamAssassin, you should probably set the
> default to be more conservative, like 8.0 or 10.0. ...

I manage a corporate email feed with a few dozen email addresses, which
needs to accept vendor email from Asia, a non-profit email feed with a
dozen email addresses, and an email feed for my family. I do this with
one combined user_prefs file (I have no access to local.cf).

I therefore run all three systems with a required_hits of 9.0

GF> As a suggestion, in the interest of working towards some sort of
GF> colaboration on sharing rules, perhaps scores shouldn't be included
GF> at all with the suggested rules, and users should be encouraged to
GF> score them themsevles? Alternatively, suggested scores should be
GF> given using the default cut off of 5 as the agreed upon base line?

The problem with any standardization is that you then have the problem of
different levels of aggressiveness. Because of the needs of my
recipients, I generally follow the SA policy that it's much better to get
a few false negatives (spam that slips through) than to get a false
positive (a proper email flagged as spam). However, NOBODY among my
recipients is interested in porn, so I am very aggressive with
porn-related spam. 

I don't expect anyone else to have the same mix of aggressive/permissive
scoring that I have.

If anything I'd lean toward the "scores shouldn't be included", or
better, all scores should be between -0.5 and +0.5, with the individual
raising or lowering scores according to experience.

Another option would be for us to include a version of statistics with
the rules (eg: this rule matches 95 spam and 2 ham in my corpus).

GF> Naming conventions. Bob consistently prefixes his rules with L_b,
GF> which I assume means "local body rule". Others use their own
GF> descriptive names. If we are to move towards collaboration, perhaps
GF> just regular descriptive names are better, so that they can easily
GF> become candidates for the production version of SA? In that regard,
GF> it might be helpful to have a central repository of currently
GF> registered rule names, so that we don't collide down the road.
GF> (Sounds overly complicated, but am just trying to understand the
GF> infrastructure necessary to support collaboration.)

The reason I use my L_type_descript rule name format is so I can
1) differentiate between my local rules vs distributed rules,
2) differentiate between similar/identical rules in different sections
   L_f_Free -- the word Free in the "from" header
   L_t_Free -- the word Free in the "to" or "cc" header
   L_s_Free -- the word Free in the "subject" header
   L_b_Free -- the word Free in the body
   L_u_Free -- the word free as part of a URI
3) Organize my user_prefs file (180k+, including blacklists), so it's
   easier to find any given rule, even if I've forgotten what it's name
   is (I at least can go to the section it belongs to).

I also used mixed case to make the rules easier for me to understand and
read, eg:
> body  L_b_AllTheMoney /all\ the\ money\ in\ the\ world/i

I agree that it would be good to have some standardization. I don't think
we're going to be able to "register" rule names in the near future, just
because of the great variation in rules (my FurtherDetails rule may be
very different from someone else's FurtherDetails rule).

There are also other questions that would need to be resolved. Example:
How do we differentiate between case-insensitive rules, and
case-sensitive rules? Though I usually try to avoid single-word rules, I
do have a rule that finds the word "guaranteed" and scores it 0.1, and a
second rule which finds the same word in all upper case and scores it 1.1

Given that we do not want to hamper the creation of new rules within the
SA distribution, I wouldn't mind working toward some kind of
standardization of user-contributed rules.

Bob Menschel

-----BEGIN PGP SIGNATURE-----
Version: PGP 8.0

iQA/AwUBPyyPwZebK8E4qh1HEQLPZACgnBFYpEhn7r7fKZT5dE7aChsRbRwAoMH3
nyKOxRKX/j9DKLBBoNf8sIXc
=ByA3
-----END PGP SIGNATURE-----




-------------------------------------------------------
This SF.Net email sponsored by: Free pre-built ASP.NET sites including
Data Reports, E-commerce, Portals, and Forums are available now.
Download today and enter to win an XBOX or Visual Studio .NET.
http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet_072303_01/01
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to