Thomas Bushnell BSG <[EMAIL PROTECTED]> writes: > This is not universal, but it is extremely common. Even someone as > patient and decent as you doesn't give a hint in your email here that if > your rule is over-broad and drops a valid message, you would regard that > as something you must fix, or apologize for, or anything other than > "this is the price that (other people) have to pay."
If I had a spam filtering rule that was over-broad and caught a message that wasn't spam, I would feel bad about that, and about inconveniencing the sender. To the extent that I could, I would try to ensure that similar non-spam messages were not caught by the spam filter in the future. One of the reasons why I use a statistical filter like bogofilter is so that there is a well-defined way for me to do that through filter training. However, in some cases, it may not be possible to fix the problem. The person may have run afowl of some rule that Stanford had to apply site-wide for some reason (such as to prevent DoS attacks on our mail servers), or they may be triggering some rule with such a high weight and such a proven track record at catching spam that the single data point is not statistically significant. In such cases, I would explain to the other person that I consider the false positive to be a flaw in my spam filtering method, but I can't find a way to fix the flaw that wouldn't cause worse problems for me. In other words, I think that spam filtering false positives are bugs, but some bugs are wontfix. On the specific case of sending mail directly from dialup IP addresses, I would strongly recommend against ever doing this currently because, by doing so, one is putting oneself in a statistical bucket that is *overwhelmingly* spam. To a first approximation, all mail direct from dialups is spam. I personally prefer scoring filters at ever level, but I know some people who have simply banned all mail from dialups, and when they show me the statistics they're dealing with, I can't help but admit that it makes sense for them to do what they do. This is particularly the case for people who receive orders of magnitude more spam than my paltry few thousand a day (and such people most certainly exist). Being cautious about what statistical bucket one puts oneself into when communicating has been standard advice on the Internet for decades. Even long before the advent of spam, it's always been the case that certain ways of expressing oneself made it far more likely that people would ignore one's messages (like writing in all caps, in a language the other person didn't understand, or to inappropriate addresses). In an ideal world, everyone would listen to all communication in direct proportion to the amount of useful content in that communication. Alas, in the real world, we all have limited time and have to optimize expenditure of that time, and we tend to do that statistically by dropping whole classes of communication that have a very bad signal to noise track records. > So part of what's going on in the shift from by-hand false positives to > automated false positives is a little bit of the old "blame it on the > computer". I do agree, and I think that's unfortunate. I don't think that viewing this as a question of fault is particularly useful. -- Russ Allbery ([EMAIL PROTECTED]) <http://www.eyrie.org/~eagle/> -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]