On Mon, May 27, 2002 at 11:35:20PM +0200, Tony L. Svanstrom wrote: > On Sun, 26 May 2002 the voices made Duncan Findlay write: > > > On Sun, May 26, 2002 at 10:55:34PM +0200, Tony L. Svanstrom wrote: > > > > Should rules, clearly involving nasty things used by spammers, be removed when > > > the scores go negative? > > > > I think so. Rules designed to catch spam, scored negatively, even if they > > occur more frequently in non-spam than spam, are NOT good indicators of spam. > > They are merely bad/false indicators of spam, and the regexp's should be > > changes to make them better spam indicators. > > > > If we want to have negative scoring rules, we should try to put together > > regexp's that are actually non-spam indicators. > > The reason for putting it as a question was that one might argue that you need > both positive and negative rules, and if a rule even after being improved has a > GA-score that isn't expected you'd still keep it. In fact, you'd want as many > far from general rules you could ever think of, and then let the magic of GA > sort it out. > BUT... taking that all the way will in the end clearly slow things down a lot. >
But, tests meant to catch spam are ill-suited for determining non-spam. I agree that negative scores are a good thing, but only on tests designed to do that. The GA is amazing. But humans are smarter than computers (we have to program them after all). The GA is not perfect in all situations, especially since it is only as good as it's corpii (plural of corpus?). Furthermore, if tests are scored negatively, albeit being designed to catch spam, spammers are simply going to use them, making messages just as spammy (or even more?), but scoring less. Of course, these can be added to the corpus and everything will re-adjust, but it's pointless nonetheless. > > Personally I don't see any real problems with the way it is today, as it is > today; but I seem to hear more and more about localization-related problems as > well as people tinkering with the rules. From my point of view that's telling > us that maybe some changes, or at least a discussion, is needed to avoid that > too many move away from the core and/or before SA grows some more (filesize, > number of rules and users). > To begin with, people tinkering and dealing with less common (to the average > user of SA) languages isn't good for GA-scores; esp. if you are using rules > with negative scores... > > > I have no idea where I'm heading with this, but I wrote it, so now you had to > read it. ;-) I have no clue what you meant by that last paragraph :-) -- Duncan Findlay _______________________________________________________________ Don't miss the 2002 Sprint PCS Application Developer's Conference August 25-28 in Las Vegas -- http://devcon.sprintpcs.com/adp/index.cfm _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk