Kevin A. McGrail created COMDEV-260:
---------------------------------------

             Summary: SpamAssassin Bayes Token ID
                 Key: COMDEV-260
                 URL: https://issues.apache.org/jira/browse/COMDEV-260
             Project: Community Development
          Issue Type: Project
            Reporter: Kevin A. McGrail


>From DFS idea used with permission:

We tokenize inbound messages and store the tokens on the server. In each 
message, we add links for doing training. When you click on a training link, 
the system trains the message based on the tokens stored on the server. In that 
way, you are training using exactly the tokens that the Bayes code saw. 

For SA, the key point is a framework to store the Bayesian tokens from the 
email before delivery of the email so later, a "this is spam" "this is ham" 
mechanism can take advantage of that information without having the entire 
email.

Adding a header with the message id for the storage of the headers allows a 
framework to be built for train as spam, train as ham to be more readily built.

The issues you are pointing to have to deal more with the implementation of the 
this is spam/this is ham mechanism.

By storing just the tokens, there is less space and privacy & legal concerns 
are mitigated.

sa-learn would then be extended to use the message id and learn as spam/ham 
instead of feeding it the entire message.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@community.apache.org
For additional commands, e-mail: dev-h...@community.apache.org

Reply via email to