I am interested in building a training suite for spamassassin, based on past learning. The trouble is, I don't have my mail sorted exactly into spam and ham. What I do have is an accurate bayes-seen database created by spamassassin and corrected by scrupulous use of sa-learn. I also have log of all my incoming mail (spam+ham).
So here's my question. Is there some easy way I can do this: unfiltered-incoming-mail + bayes-seen ==> ham + spam + unclassified Conceptually it is easy: read each mail, look it up in bayes-seen, and send it to the appropriate file. At one point I naively thought I might modify sa-learn to do this, but it became apparent to me that it would take more than the time I have available to understand sa-learn well enough to do so. So my questions are these: - is there an easy way to do this without modifying sa-learn? - is there an easy way to add an option to sa-learn to do this? (even if I had something like "sa-learn --query" that reported spam/ham/unknown the remaining infrastructure would be easy enough) I think a solution to this problem would be of general interest. It is often not convenient to maintain separate vetted ham/spam files. For example, my setup is as follows. I filter my mail on a central machine and then distribute the ham to various mail clients (laptop, home & work desktops ...). The allgeged spam is saved on the central server. On the clients I report misclassified stuff back to the central machine, and in the alleged spam file I check for misclassified stuff, so the bayes database is accurate. It is easy enough for the central machine to keep every mail message, but I have no sensible way for it to separate ham/spam, and to update the separate files in response to error reports from the various clients. If I had a solution to this problem I'd be in a position to contribute to the release statistics, and to run other experiments. Suggestions appreciated. ------------------------------------------------------- This SF.Net email is sponsored by: INetU Attention Web Developers & Consultants: Become An INetU Hosting Partner. Refer Dedicated Servers. We Manage Them. You Get 10% Monthly Commission! INetU Dedicated Managed Hosting http://www.inetu.net/partner/index.php _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk