-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hello Gary,
Saturday, August 2, 2003, 6:55:01 PM, you wrote: GF> Bob, Excellent. thanks. - Gary My pleasure -- my way of contributing back to those who develop and expand SA. GF> I followed some of the links on the main page, ... GF> A couple of questions/suggestions: GF> Chris Santerre's list of rules at GF> http://www.merchantsoverseas.com/wwwroot/gorilla/sa_rules.htm in turn GF> points to some "body rules" that you (Bob Menschel) contributed, GF> http://www.merchantsoverseas.com/wwwroot/gorilla/body.txt which has a GF> lot of useful rules, but looking at the scores, it seems that you're GF> using a higher threshold than the default value of 5? What value are GF> you using, and why did you depart from the default value (of 5)? Per http://www.spamassassin.org/doc/Mail_SpamAssassin_Conf.html > required_hits n.nn (default: 5) > Set the number of hits required before a mail is considered spam. n.nn > can be an integer or a real number. 5.0 is the default setting, and is > quite aggressive; it would be suitable for a single-user setup, but if > you're an ISP installing SpamAssassin, you should probably set the > default to be more conservative, like 8.0 or 10.0. ... I manage a corporate email feed with a few dozen email addresses, which needs to accept vendor email from Asia, a non-profit email feed with a dozen email addresses, and an email feed for my family. I do this with one combined user_prefs file (I have no access to local.cf). I therefore run all three systems with a required_hits of 9.0 GF> As a suggestion, in the interest of working towards some sort of GF> colaboration on sharing rules, perhaps scores shouldn't be included GF> at all with the suggested rules, and users should be encouraged to GF> score them themsevles? Alternatively, suggested scores should be GF> given using the default cut off of 5 as the agreed upon base line? The problem with any standardization is that you then have the problem of different levels of aggressiveness. Because of the needs of my recipients, I generally follow the SA policy that it's much better to get a few false negatives (spam that slips through) than to get a false positive (a proper email flagged as spam). However, NOBODY among my recipients is interested in porn, so I am very aggressive with porn-related spam. I don't expect anyone else to have the same mix of aggressive/permissive scoring that I have. If anything I'd lean toward the "scores shouldn't be included", or better, all scores should be between -0.5 and +0.5, with the individual raising or lowering scores according to experience. Another option would be for us to include a version of statistics with the rules (eg: this rule matches 95 spam and 2 ham in my corpus). GF> Naming conventions. Bob consistently prefixes his rules with L_b, GF> which I assume means "local body rule". Others use their own GF> descriptive names. If we are to move towards collaboration, perhaps GF> just regular descriptive names are better, so that they can easily GF> become candidates for the production version of SA? In that regard, GF> it might be helpful to have a central repository of currently GF> registered rule names, so that we don't collide down the road. GF> (Sounds overly complicated, but am just trying to understand the GF> infrastructure necessary to support collaboration.) The reason I use my L_type_descript rule name format is so I can 1) differentiate between my local rules vs distributed rules, 2) differentiate between similar/identical rules in different sections L_f_Free -- the word Free in the "from" header L_t_Free -- the word Free in the "to" or "cc" header L_s_Free -- the word Free in the "subject" header L_b_Free -- the word Free in the body L_u_Free -- the word free as part of a URI 3) Organize my user_prefs file (180k+, including blacklists), so it's easier to find any given rule, even if I've forgotten what it's name is (I at least can go to the section it belongs to). I also used mixed case to make the rules easier for me to understand and read, eg: > body L_b_AllTheMoney /all\ the\ money\ in\ the\ world/i I agree that it would be good to have some standardization. I don't think we're going to be able to "register" rule names in the near future, just because of the great variation in rules (my FurtherDetails rule may be very different from someone else's FurtherDetails rule). There are also other questions that would need to be resolved. Example: How do we differentiate between case-insensitive rules, and case-sensitive rules? Though I usually try to avoid single-word rules, I do have a rule that finds the word "guaranteed" and scores it 0.1, and a second rule which finds the same word in all upper case and scores it 1.1 Given that we do not want to hamper the creation of new rules within the SA distribution, I wouldn't mind working toward some kind of standardization of user-contributed rules. Bob Menschel -----BEGIN PGP SIGNATURE----- Version: PGP 8.0 iQA/AwUBPyyPwZebK8E4qh1HEQLPZACgnBFYpEhn7r7fKZT5dE7aChsRbRwAoMH3 nyKOxRKX/j9DKLBBoNf8sIXc =ByA3 -----END PGP SIGNATURE----- ------------------------------------------------------- This SF.Net email sponsored by: Free pre-built ASP.NET sites including Data Reports, E-commerce, Portals, and Forums are available now. Download today and enter to win an XBOX or Visual Studio .NET. http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet_072303_01/01 _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk