I am interested in building a training suite for spamassassin, based
on past learning.  The trouble is, I don't have my mail sorted exactly
into spam and ham.  What I do have is an accurate bayes-seen database
created by spamassassin and corrected by scrupulous use of sa-learn.
I also have log of all my incoming mail (spam+ham).

So here's my question.  Is there some easy way I can do this:

   unfiltered-incoming-mail + bayes-seen ==> ham + spam + unclassified

Conceptually it is easy:  read each mail, look it up in bayes-seen,
and send it to the appropriate file.

At one point I naively thought I might modify sa-learn to do this,
but it became apparent to me that it would take more than the time
I have available to understand sa-learn well enough to do so.

So my questions are these:
   
  - is there an easy way to do this without modifying sa-learn?
    
  - is there an easy way to add an option to sa-learn to do this?
    (even if I had something like "sa-learn --query" that reported
     spam/ham/unknown the remaining infrastructure would be easy
     enough)

I think a solution to this problem would be of general interest.
It is often not convenient to maintain separate vetted ham/spam files.
For example, my setup is as follows.  I filter my mail on a central
machine and then distribute the ham to various mail clients (laptop,
home & work desktops ...).  The allgeged spam is saved on the central
server.  On the clients I report misclassified stuff back to the
central machine, and in the alleged spam file I check for misclassified
stuff, so the bayes database is accurate.  It is easy enough for the
central machine to keep every mail message, but I have no sensible
way for it to separate ham/spam, and to update the separate files
in response to error reports from the various clients.

If I had a solution to this problem I'd be in a position to contribute
to the release statistics, and to run other experiments.

Suggestions appreciated.



-------------------------------------------------------
This SF.Net email is sponsored by: INetU
Attention Web Developers & Consultants: Become An INetU Hosting Partner.
Refer Dedicated Servers. We Manage Them. You Get 10% Monthly Commission!
INetU Dedicated Managed Hosting http://www.inetu.net/partner/index.php
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to