On my imap server, I've got two different trash folders, one for ham, one for spam. Nothing new there.
However, on the hour, I've got a script that runs sa-learn on them and records three things for each message:
- The overall spam score
- The BAYES_XX number
- Whether the user marked it as spam or ham
Originally, I was using this to fine-tune my spam-threshold. However, since I've been building my bayes db for over a year now, it has become very accurate.
What I want now is a script that can:
A) Find some "optimum" spam-threshold based on FP or FN rate. (I've already got that)
B) Compare this with the BAYES_XX values for the various spams/hams and, if the Bayes values have a higher correlation with what the *user* considers spam/ham, suggest different scoring values for the BAYES_XX hits.
In other words, I want a script that doesn't just auto-tune a user's spam-threshold, but the bayes scoring as well as the bayes db gets better and better.
Anybody done something like this?
- Joe
smime.p7s
Description: S/MIME Cryptographic Signature