On 08 Nov 2003, at 06:46, Terry Milnes wrote:
Some of us though are system administrators and need a solution to offer to the end users. The typical end user wants to open their email and see no spam, period.
Since the definition of spam varies from person to person that is simply not possible without customized tweaking.
Incorrect, it is simply not possible period. Unless of course you mean for a day or two, or on a virgin account or if you disregard the fp's.
Presently without the tweaks and training all we can do is reduce his spam by about 50 - 60%.
Much much much better than that.
# SUMMARY for threshold 5.0:
# Correctly non-spam: 16785 30.50% (99.84% of non-spam corpus)
# Correctly spam: 37347 67.87% (97.73% of spam corpus)
# False positives: 27 0.05% (0.16% of nonspam, 3617 weighted)
# False negatives: 869 1.58% (2.27% of spam, 2912 weighted)
# TCR: 38.063745 SpamRecall: 97.726% SpamPrec: 99.928% FP: 0.05% FN: 1.58%
That means that 98.27% of email processed with SA at a score level of 5.0 is correctly marked as spam/ham, and 97.73% of spams will be tagged correctly.
No, this means that with a threshold of 5 you had 27 false positives. You omitted to quote my original response, "Settings have to be left at conservative in order not to get the phone calls complaining about false positives".
Threshold at 8 (no bayes) 3 established test accounts 31 day period Total Mail 10,228 Total Spam 4,008 Correctly identified Spam 2,325 Unidentified Spam 1,683 False Positives 1
Threshold at 4 (+bayes and comprehensive white/black lists) on 3 accounts 31 days (Oct)
Total Mail 12,524
Correctly identified Spam 4,712
Unidentified Spam 13
False Positives 2
When a user signs up for this service we tell him we can reduce your spam by over 50%, and that there is a minute chance that a mail could be incorrectly identified as spam. He has a UI that he can log into to customize his settings to increase the caught ratio.
He is also advised on how to filter his mail and to constantly check his spam folder to make sure that there are none misidentified.
The typical user is capable of making toast in his electric toaster, but when it comes to the overwhelming complexities involved in operating a computer he is totally lost. He will become extremely agitated when he looses the *REALLY IMPORTANT* email that was tagged as spam, put into his spam mailbox and which he subsequently deleted because he didn't pay close enough attention to what *HE* was doing, that instantly becomes our fault!
That same user doesn't comprehend filtering, he looks at spam, says it's spam and can delete it, he thinks computers are smarter than people and why can't it do the same thing, he thinks the program that does it and screws up is useless.....
So to avoid the loss of those *REALLY IMPORTANT* emails we leave the settings at a conservative level.
When someone posts to this list asking how he can improve the hit ratio for his customers/users, cites examples or ideas that may improve the success ratio for his situation, (multiple user, many morons) the last thing he wants to hear about is how good spamassassin is without any of his kind of modification and that if he use bayes or spends a little time tweaking he can see results like these.... He is probably already aware of that
I use spamassassin and think its the greatest thing since sliced bread, but I also spend considerable time keeping my settings up to date, monitoring and carefully selecting what could be incorrectly identified/misidentified, the typical end user will NOT go to this effort and needs us to hold his hand.
Terry
------------------------------------------------------- This SF.Net email sponsored by: ApacheCon 2003, 16-19 November in Las Vegas. Learn firsthand the latest developments in Apache, PHP, Perl, XML, Java, MySQL, WebDAV, and more! http://www.apachecon.com/ _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk