Before I actually write this, I'll aks to see if someone already has done it.

On my imap server, I've got two different trash folders, one for ham, one for spam. Nothing new there.

However, on the hour, I've got a script that runs sa-learn on them and records three things for each message:
- The overall spam score
- The BAYES_XX number
- Whether the user marked it as spam or ham

Originally, I was using this to fine-tune my spam-threshold. However, since I've been building my bayes db for over a year now, it has become very accurate.

What I want now is a script that can:
A) Find some "optimum" spam-threshold based on FP or FN rate. (I've already got that)
B) Compare this with the BAYES_XX values for the various spams/hams and, if the Bayes values have a higher correlation with what the *user* considers spam/ham, suggest different scoring values for the BAYES_XX hits.

In other words, I want a script that doesn't just auto-tune a user's spam-threshold, but the bayes scoring as well as the bayes db gets better and better.

Anybody done something like this?

- Joe

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature



Reply via email to