I'll apologize in advance for being a bit "strongly opinionated" but your 
viewpoint on this strikes me as wrongheaded.

There is clearly a strong obligation for the snort development efforts to 
reduce false positives. Heck, the whole reason snort uses a GA for score 
assignment in the first place is to tune the false positive rate by picking 
which strings commonly exist in non-spam mail while jacking up the scores 
of rules which match only in spam. To take the viewpoint that it is the 
obligation of the mailer to tune their mailing against SA completely 
negates the value of much of the design work that went into SpamAssassin. 
Heck, a lot of the effort in the rule design of SA is to *prevent* the 
tuning of a mailing (ie: to keep pr0n spammers from tuning their mailings 
to not hit).

I've not seen the content of that particular M$DN mailing, however AFAIK 
the development team for SA has defined "lists which you willfully 
subscribe to" as non-spam. So the fact that it is MSDN does make it 
non-spam. The MSDN lists are strictly opt-in, and being a MSDN subscriber 
does NOT require you to receive them.

The only rule which I think MSDN has an obligation to tune for is the 
MISSING_OUTLOOK_NAME one. The rest of the rules that email matched are 
pretty innocuous ones.

Let's face it, most of the rules they matched are a lot of weak indicators 
of spam, they just matched a lot of them.

CALL_FREE is not much of a sign of spam. Most large companies have 800 
numbers and mention them in even the most innocuous emails. Personally, 
I've got this rule manually 0'ed out.

FROM_NAME_NO_SPACES why did this rule ever come about? Most of my "personal 
friends only" accounts only use "Matt" as a name. It's only my more 
business/community oriented ones that I use "Matt Kettler". Sure, some 
spammers use one-word names, but my "spam" mailbox has by far more names 
that have 2 or more words in them than those that have no spaces. I guess 
that's why when this got GAed it got a very small negative score instead of 
a positive one. (june 6th cvs had 0.500 as a score, 2.31 and July 11 have 
-0.114)

FROM_HAS_MIXED_NUMS maybe, but lots of people create emails like this in 
crowded domains. Seems more like a hotmail/yahoo detection than much of a 
strong spam sign.

FREE_CAP
DO_IT_TODAY
SAVE_MONEY
SAVE_BUCKS - these might be good, but as of today's CVS, they hadn't been 
hit by the GA yet and all had a default score of 1.0 assigned. I bet the GA 
runs these down a bit once it starts getting some matching data in the 
corpus. 1.0 each seems a lot high to me, particularly in the case of 
FREE_CAP (strikes me as a 0.3 or 0.5ish thing) and DO_IT_TODAY (ie: mail 
from your boss telling you to get off your behind and do that project today).



OFFER
- yes, it's marketing, but it's requested marketing not garbage spam. It's 
probably acceptable to hit em with points for this anyway.

TO_BE_REMOVED_REPLY
MAILTO_WITH_SUBJ
UNSUB_PAGE - ok, lets face it, plain-jane unsubscribe footers aren't a 
strong indicator of spam. These scores go up and down wildly as different 
forms appear in the corpus as spam/nonspam. Every legitimate subscription 
mailing has em in one form or another, even this mailing list. I'd love to 
see a rule to catch mailto links that send remove mails to various 
"freemail" domains like hotmail/yahoo, with some caution to make sure you 
don't catch yahoo groups unsub addresses. It would certainly have a much 
lower "false positive" rate than these simple rules. TO_BE_REMOVED_REPLY is 
a great example of these going up and down a lot. v2.20 had a score -2.150 
and v2.31 had +3.985, and it did that without the rule changing at all, 
just the tide of what's in the corpus. MAILTO_WITH_SUBJ also did a -/+ 
flip. Such wild changes in GA score really indicate to me that these rules 
are pretty questionable and vary wildly in accuracy based on what direction 
the wind is blowing today.


LINES_OF_YELLING,LINES_OF_YELLING_3,LINES_OF_YELLING_2 - the worth of these 
is commonly disputed due to the large number of dense innocents who use all 
caps. Still probably ok to hit em with points for this.

SUPERLONG_LINE I'm guessing this was originally made to match spam, but the 
GA scores it more for the non-spam side. Seems like a strange rule when 
looking at the overall structure of HTML spam (long line likely), non HTML 
spam(long line unlikely), and personal mails from a variety of mailers 
(long line likely if mailer doesn't do wrapping, unlikely if it does).

DOUBLE_CAPSWORD - a good rule I think. Worth hitting em for some points

MISSING_OUTLOOK_NAME - ok, this one is foolish for MSDN to have matched on. 
MS clearly should not strip their X-Mailer headers when mailing their 
legitimate mailing lists.








At 12:18 PM 7/23/2002 -0500, SpamTalk wrote:
>I am still of the opinion that the onus us upon the mass mailers to
>legitimize their messages. I get other news letters that do not have to be
>whitelisted. They send a short list of text synopses with hyperlink's to the
>full story, so I only get blasted with ads (except I use guidescope
>[www.guidescope.com] and pop-up stopper [www.panicware.com] to squash most
>of them) if I want to read more.
>
>-----Original Message-----
>From: Bart Schaefer [mailto:[EMAIL PROTECTED]]
>Sent: Tuesday, July 23, 2002 11:48 AM
>To: [EMAIL PROTECTED]
>Subject: RE: [SAtalk] Microsoft developer newsletter tagged as spam
>
>
>On Tue, 23 Jul 2002, SpamTalk wrote:
>
> > It _IS_ spam. The fact it is from M$DN does not mitigate the fact that
> > they take advantage of having your email address to load all that crap
> > in the same boat.
>
>It's not spam unless they send it unsolicited.  The point is merely that
>a high SA content score does not mean the mail was not asked for -- and if,
>for example, an ISP were to choose to deploy an SMTP-time block using SA,
>they risk intercepting legitimate mail.
>
>
>
>-------------------------------------------------------
>This sf.net email is sponsored by:ThinkGeek
>Welcome to geek heaven.
>http://thinkgeek.com/sf _______________________________________________
>Spamassassin-talk mailing list [EMAIL PROTECTED]
>https://lists.sourceforge.net/lists/listinfo/spamassassin-talk
>
>
>-------------------------------------------------------
>This sf.net email is sponsored by:ThinkGeek
>Welcome to geek heaven.
>http://thinkgeek.com/sf
>_______________________________________________
>Spamassassin-talk mailing list
>[EMAIL PROTECTED]
>https://lists.sourceforge.net/lists/listinfo/spamassassin-talk



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to