Rob McMillin wrote: > Daniel Quinlan wrote: > > >Rob McMillin <[EMAIL PROTECTED]> writes: > > > >>When sysadmins in those TLDs fix their relays, I'll be happy to hear > >>them out. > >> > >The other problem with using this type of test in a spam corpus is > >that you're using a small subset of global spam. I don't do any > >business with people from some random country, so of course, all of my > >mail from them is spam. However, if someone else does business with > >that country, 99.99% of their mail from there is not spam. It's very > >specific to the user. > > > >Also, it's bad public relations for SA and spam-filtering in general. > >(Just a fact of life.) > > > Let me get this straight -- we have ignorant and the willfully abusive > people in these countries creating or abetting spam for others to deal > with, and *we're* supposed to be concerned about public relations?
Actually, you should always be concerned about public relations. This is probably most important when you're trying to convince someone else to do the right thing. > This strikes me as very backwards. Depends on whether you prefer revenge or concensus through dialogue. > If we're so worried about what other people think of us and what we do, > we really ought to close up shop now, because there's armies of spammers > out there who think their MAKE MONEY FAST messages should positively, > absolutely get delivered and read and acted on. After all, we're extremists! Well, there are some people who are just plain wrong :) cf your previous Oliver Wendell Holmes quote. > >It would be better to find a rule that just worked. > > > The one I submitted *does*. ...for people outside those TLDs, yes it doesn. I think the proposal below actually would work better for more people. It's not as easy to implement, but I do think it would make SA a better product. I think it probably needs some refinement though -- I suspect most people get the majority of their spam from .com, as well as the majority of their legitimate mail (unless you're in college or something), so you'd end up with .com in the "best 25%" and have all .com mail get a -1 bonus. I think it's probably be better to give worst 25% +3, next 25% +1, and leave the best 50% unchanged. > >For example, one > >method would be a TLD "whitelist". As spamassassin receives mail, > >there are two counters for each TLD. One is total messages and the > >other is total number tagged as spam. Then rank the spam ratio by > >TLD. TLDs in the worst quartile get +1.0 score, TLDs in the next 25% > >get +0.5 score, TLDs in the next 25% get -0.5, TLDs in the best 25% > >get -1.0. > > > Yes, and while users wait for this rule to accumulate this knowledge, > mine could be chucking spam into the approprate non-Inbox folder. > Seriously -- are we now going to pitch the default rules against finding > Big5 encoding in the Subject line because it might trample the feelings > of some people? Maybe they'll figure out that their get-rich-quick > schemes are unwelcome and stop sending them. It catches spam for me, > that's all I can say. The point of SpamAssassin is, to some degree, to > disseminate the collected experience of its contributors. And that > includes unpleasant and possibly politically incorrect experience. I am not one of the world's most PC people, but if there is a PC solution which works equally well, I think it makes sense to adopt it. On the other hand, I agree that you do get this "bootstrap" problem, but that could probably be overcome by having a default set of initializers for the TLD scores which are set from the corpus. Then if individuals have different TLD spam patterns, their systems will evolve to stop scoring those TLDs as badly for them. C _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk