[ https://issues.apache.org/jira/browse/COMDEV-260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kevin A. McGrail updated COMDEV-260: ------------------------------------ Summary: GSOC 2018 SpamAssassin Bayes Token ID (was: SpamAssassin Bayes Token ID) > GSOC 2018 SpamAssassin Bayes Token ID > ------------------------------------- > > Key: COMDEV-260 > URL: https://issues.apache.org/jira/browse/COMDEV-260 > Project: Community Development > Issue Type: Project > Reporter: Kevin A. McGrail > Priority: Major > > From DFS idea used with permission: > We tokenize inbound messages and store the tokens on the server. In each > message, we add links for doing training. When you click on a training link, > the system trains the message based on the tokens stored on the server. In > that way, you are training using exactly the tokens that the Bayes code saw. > For SA, the key point is a framework to store the Bayesian tokens from the > email before delivery of the email so later, a "this is spam" "this is ham" > mechanism can take advantage of that information without having the entire > email. > Adding a header with the message id for the storage of the headers allows a > framework to be built for train as spam, train as ham to be more readily > built. > The issues you are pointing to have to deal more with the implementation of > the this is spam/this is ham mechanism. > By storing just the tokens, there is less space and privacy & legal concerns > are mitigated. > sa-learn would then be extended to use the message id and learn as spam/ham > instead of feeding it the entire message. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@community.apache.org For additional commands, e-mail: dev-h...@community.apache.org