Now that I have a lot of data, I have written a script that tallies it all up and, rather than picking a spam-threshold score, let's me merely indicate the false-positive or false-negative rate that I prefer... and then the script figures out what score I need.
The idea is that I would indicate my false-positive or false-negative preference, and then the script could run once a week, for example, and adjust my spam-threshold in my SA user preferences.
Since I'm considering putting this into a complete "auto-tuning" kit for SA, I'm interested in hearing some suggestions.
Right now, my idea is that it would be used through some user-configuration webpage. As such, the user would need to be presented with some scenarios. For that purpose, the script can show you scenarios for a few false-positive and false-negative rates, like this sample output shows. The first three aim for false-positive rates of 1-in-10, 1-in-100, and 1-in-1000, while the next three aim for the same for false-negatives:
Spam-Threshold: 0.3 Ham messages lost: 1 in every 10.02 Spam messages allowed: 1 in every 241.92
Spam-Threshold: 8.2 Ham messages lost: 1 in every 118.20 Spam messages allowed: 1 in every 29.58
Spam-Threshold: 15 Ham messages lost: 1 in every 99999.00 Spam messages allowed: 1 in every 2.44
Spam-Threshold: 10 Ham messages lost: 1 in every 147.75 Spam messages allowed: 1 in every 10.32
Spam-Threshold: 5.7 Ham messages lost: 1 in every 45.46 Spam messages allowed: 1 in every 87.74
Spam-Threshold: -5.8 Ham messages lost: 1 in every 1.04 Spam messages allowed: 1 in every 266.18
Now, so that this data would be easy for a cgi script to use in a web page, it also outputs the data in comma-separated format, in the format of:
"score,<one-FP-in-every-X-messages>,<one-FN-in-every-X-messages>"
0.3,10,241
8.2,118,29
15.0,99999,2
10.0,147,10
5.7,45,87
-5.8,1,266
Now, to get a spam-threshold for, say, one FP in every 500 messages, you might pass it a command-line argument of "FP:500" and it would just spit you back a single number. Same would go for a false-negative... passing something like "FN:500".
Does anybody else out there envision other ways to use this script? Are there any other features it should have?
- Joe
smime.p7s
Description: S/MIME Cryptographic Signature