Re: [SAtalk] RE: Troubling new scores in 2.1 release

Craig Hughes Thu, 28 Feb 2002 13:21:48 -0800

On 2/28/02 7:06 AM, "Shane Williams" <[EMAIL PROTECTED]> wrote:

> On Thu, 28 Feb 2002, Michael Moncur wrote:
> 
>> While some of the negative scores (like DEAR_SOMEBODY) might have
>> really turned into legitimate indicators of non-spam, I don't think
>> any message deserves having its spam score reduced by 8 points by
>> virtue of its mentioning "www.monsterhut.com", a well-known spam
>> source.
> 
> This got me thinking.  Does the corpus contain emails discussing spam?
> If so, that would clearly throw off the evolution of scores.

That's not the problem here -- the corpus contains 40 "spam" messages with
monsterhut and 0 nonspams with monsterhut.  The score is coming out as -ve
because the monsterhut test is non-descriminating.  Every monsterhut message
contains enough other rules being hit that the score is over the threshold
regardless of whether that rule is scored high or low.

> Similarly, I think part of the problem is that everyobody's spam and
> non-spam may be vastly different.  Obviously, the more sources the
> corpus is drawn from the less this will be an issue, but until then
> the GA will be craeting scores tuned more accurately for the types of
> users who submit to the corpus.

This is true, but much effort has been put into making the corpus
representative of a broad range of mail.  Spam is much more similar than
nonspam.  For example, Business users tend to like emails with dollar signs
in them more than techie users.  I think there are basically 3 categories of
email users for spam-id purposes (if we go down the road of letting the user
choose which rules/scores to apply to themselves): TECHIE, BUSINESS, and MOM
(aka AOL user).  There might possibly be another category COLLEGEKID or
something, but I don't know if that wouldn't be subsumed in one of the other
3.  It's also possible that BUSINESS would work fine for MOM.

C

_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Re: [SAtalk] RE: Troubling new scores in 2.1 release

Reply via email to