[redirected to SAtalk per Dan's suggestion]

Mark Perkel writes:
> Yes - but because a subject isn't generally discussed among most people
> doesn't meen that it's isn't important as a subject to many people. And
> I think that Spam Assassin should never become (inadvertantly) a
> censorship tool.

Nothing becomes a censorship tool unless a government is involved and forces
people to use it.

> I would for the Electronic Frontier Foundation and many of my coworkers
> who religiously support Free Speech look at any spam filtering as
> censorship. So - we can accidently create prohibited subjects based on
> GA scores.

SA doesn't prohibit anything. It doesn't filter spam, for that matter. It
only classifies messages and adds reports. If anyone blocks or drops
messages based on SA scores, they're creating prohibitions. SA isn't.

> Here's the problem with GA scoring. 99.9% of messages containing that
> word "Viagra" are spam. Therefore Viagra gets a 4.7 rating. Because of
> this Spam Assassin makes this word a prohibited subject. The thing
> that's different between Viagra spam and a discussion of Viagra is that
> Viagra spam is trying to sell you viagra.

Although I don't discuss Viagra on a regular basis, there are other
high-scoring rules where this is a potential issue for me. That's why I
raised my required_hits threshold to 7.0. Now no single rule can "prohibit"
a message. If I had my way SpamAssassin would default to an 8.0 threshold
since the scores range up to 4.6 or so. That way you'd have to match at
least two high-scoring rules to be counted as spam.

> For all of you who get this message - this message itself will be
> wringly scored as spam because I said "incest" and "viagra". But clearly
> it is not spam. SA will never be perfect - but it does need to always be
> more accurate.

Actually your message wasn't scored as spam because I whitelist the SAdev
and SAtalk lists. There's absolutely no way to discuss spam without
triggering spam detectors. Outside of discussions of spam, what's the chance
of "viagra" and "incest" appearing in the same innocent message?

Even if you sent me a message that was scored as spam, it would just appear
in a different folder for me to peruse at my leisure. This isn't censorship,
it's prioritization. If anyone is using SA's scores differently, blame them,
not SA.

> My point is - if we can write a rule that catches incest spam by
> combining the word incest with other "marketing" words then we should do
> that. In rewriting the porn rules I've been careful to try to look for
> the sell phrases and not just dirty words.

That's a sensible philosophy and I agree, but the power of SA is the ability
to score many factors. While I'd be all for reducing the score of VIAGRA and
any other rule that is so close to the 5.0 threshold, I wouldn't want to
remove the rule entirely. It's just an indication, not a red flag.

Since you seem to be running into false positives on VIAGRA and some of the
porn rules, you should gather a non-spam corpus of your own mail and
participate in the mass-check process next time. If the GA had a few of your
non-spam mails that mention "viagra" in mind, it wouldn't give the rule such
a high score.

As for "incest", it's a word that appears in many discussions of a serious
topic, a few off-color jokes, and relatively few spam messages. A
single-word rule there wouldn't make much sense, and assuming a large and
diverse enough non-spam corpus, the GA would score it very low.

--
michael moncur   mgm at starlingtech.com   http://www.starlingtech.com/
"When ideas fail, words come in very handy."            -- Goethe



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
No, I will not fix your computer.
http://thinkgeek.com/sf
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to