I'll apologize in advance for being a bit "strongly opinionated" but your viewpoint on this strikes me as wrongheaded.
There is clearly a strong obligation for the snort development efforts to reduce false positives. Heck, the whole reason snort uses a GA for score assignment in the first place is to tune the false positive rate by picking which strings commonly exist in non-spam mail while jacking up the scores of rules which match only in spam. To take the viewpoint that it is the obligation of the mailer to tune their mailing against SA completely negates the value of much of the design work that went into SpamAssassin. Heck, a lot of the effort in the rule design of SA is to *prevent* the tuning of a mailing (ie: to keep pr0n spammers from tuning their mailings to not hit). I've not seen the content of that particular M$DN mailing, however AFAIK the development team for SA has defined "lists which you willfully subscribe to" as non-spam. So the fact that it is MSDN does make it non-spam. The MSDN lists are strictly opt-in, and being a MSDN subscriber does NOT require you to receive them. The only rule which I think MSDN has an obligation to tune for is the MISSING_OUTLOOK_NAME one. The rest of the rules that email matched are pretty innocuous ones. Let's face it, most of the rules they matched are a lot of weak indicators of spam, they just matched a lot of them. CALL_FREE is not much of a sign of spam. Most large companies have 800 numbers and mention them in even the most innocuous emails. Personally, I've got this rule manually 0'ed out. FROM_NAME_NO_SPACES why did this rule ever come about? Most of my "personal friends only" accounts only use "Matt" as a name. It's only my more business/community oriented ones that I use "Matt Kettler". Sure, some spammers use one-word names, but my "spam" mailbox has by far more names that have 2 or more words in them than those that have no spaces. I guess that's why when this got GAed it got a very small negative score instead of a positive one. (june 6th cvs had 0.500 as a score, 2.31 and July 11 have -0.114) FROM_HAS_MIXED_NUMS maybe, but lots of people create emails like this in crowded domains. Seems more like a hotmail/yahoo detection than much of a strong spam sign. FREE_CAP DO_IT_TODAY SAVE_MONEY SAVE_BUCKS - these might be good, but as of today's CVS, they hadn't been hit by the GA yet and all had a default score of 1.0 assigned. I bet the GA runs these down a bit once it starts getting some matching data in the corpus. 1.0 each seems a lot high to me, particularly in the case of FREE_CAP (strikes me as a 0.3 or 0.5ish thing) and DO_IT_TODAY (ie: mail from your boss telling you to get off your behind and do that project today). OFFER - yes, it's marketing, but it's requested marketing not garbage spam. It's probably acceptable to hit em with points for this anyway. TO_BE_REMOVED_REPLY MAILTO_WITH_SUBJ UNSUB_PAGE - ok, lets face it, plain-jane unsubscribe footers aren't a strong indicator of spam. These scores go up and down wildly as different forms appear in the corpus as spam/nonspam. Every legitimate subscription mailing has em in one form or another, even this mailing list. I'd love to see a rule to catch mailto links that send remove mails to various "freemail" domains like hotmail/yahoo, with some caution to make sure you don't catch yahoo groups unsub addresses. It would certainly have a much lower "false positive" rate than these simple rules. TO_BE_REMOVED_REPLY is a great example of these going up and down a lot. v2.20 had a score -2.150 and v2.31 had +3.985, and it did that without the rule changing at all, just the tide of what's in the corpus. MAILTO_WITH_SUBJ also did a -/+ flip. Such wild changes in GA score really indicate to me that these rules are pretty questionable and vary wildly in accuracy based on what direction the wind is blowing today. LINES_OF_YELLING,LINES_OF_YELLING_3,LINES_OF_YELLING_2 - the worth of these is commonly disputed due to the large number of dense innocents who use all caps. Still probably ok to hit em with points for this. SUPERLONG_LINE I'm guessing this was originally made to match spam, but the GA scores it more for the non-spam side. Seems like a strange rule when looking at the overall structure of HTML spam (long line likely), non HTML spam(long line unlikely), and personal mails from a variety of mailers (long line likely if mailer doesn't do wrapping, unlikely if it does). DOUBLE_CAPSWORD - a good rule I think. Worth hitting em for some points MISSING_OUTLOOK_NAME - ok, this one is foolish for MSDN to have matched on. MS clearly should not strip their X-Mailer headers when mailing their legitimate mailing lists. At 12:18 PM 7/23/2002 -0500, SpamTalk wrote: >I am still of the opinion that the onus us upon the mass mailers to >legitimize their messages. I get other news letters that do not have to be >whitelisted. They send a short list of text synopses with hyperlink's to the >full story, so I only get blasted with ads (except I use guidescope >[www.guidescope.com] and pop-up stopper [www.panicware.com] to squash most >of them) if I want to read more. > >-----Original Message----- >From: Bart Schaefer [mailto:[EMAIL PROTECTED]] >Sent: Tuesday, July 23, 2002 11:48 AM >To: [EMAIL PROTECTED] >Subject: RE: [SAtalk] Microsoft developer newsletter tagged as spam > > >On Tue, 23 Jul 2002, SpamTalk wrote: > > > It _IS_ spam. The fact it is from M$DN does not mitigate the fact that > > they take advantage of having your email address to load all that crap > > in the same boat. > >It's not spam unless they send it unsolicited. The point is merely that >a high SA content score does not mean the mail was not asked for -- and if, >for example, an ISP were to choose to deploy an SMTP-time block using SA, >they risk intercepting legitimate mail. > > > >------------------------------------------------------- >This sf.net email is sponsored by:ThinkGeek >Welcome to geek heaven. >http://thinkgeek.com/sf _______________________________________________ >Spamassassin-talk mailing list [EMAIL PROTECTED] >https://lists.sourceforge.net/lists/listinfo/spamassassin-talk > > >------------------------------------------------------- >This sf.net email is sponsored by:ThinkGeek >Welcome to geek heaven. >http://thinkgeek.com/sf >_______________________________________________ >Spamassassin-talk mailing list >[EMAIL PROTECTED] >https://lists.sourceforge.net/lists/listinfo/spamassassin-talk ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk