[ https://issues.apache.org/jira/browse/COMDEV-260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mark Thomas updated COMDEV-260: ------------------------------- Component/s: GSoC/Mentoring ideas > GSOC 2018 SpamAssassin Bayes Token ID > ------------------------------------- > > Key: COMDEV-260 > URL: https://issues.apache.org/jira/browse/COMDEV-260 > Project: Community Development > Issue Type: Project > Components: GSoC/Mentoring ideas > Reporter: Kevin A. McGrail > Priority: Major > > From Diane F Skoll idea (used with permission): > We tokenize inbound messages and store the tokens on the server. In each > message, we add links for doing training. When you click on a training link, > the system trains the message based on the tokens stored on the server. In > that way, you are training using exactly the tokens that the Bayes code saw. > For SA, the key point is a framework to store the Bayesian tokens from the > email before delivery of the email so later, a "this is spam" "this is ham" > mechanism can take advantage of that information without having the entire > email. > Adding a header with the message id for the storage of the headers allows a > framework to be built for train as spam, train as ham to be more readily > built. > The issues you are pointing to have to deal more with the implementation of > the this is spam/this is ham mechanism. > By storing just the tokens, there is less space and privacy & legal concerns > are mitigated. > sa-learn would then be extended to use the message id and learn as spam/ham > instead of feeding it the entire message. > > > Apache SpamAssassin is a mail filter to identify spam. It is an intelligent > email filter which uses a diverse range of tests to identify unsolicited bulk > email, more commonly known as Spam. These tests are applied to email headers > and content to classify email using advanced statistical methods. > In addition, SpamAssassin has a modular architecture that allows other > technologies to be quickly wielded against spam and is designed for easy > integration into virtually any email system. > It is primarily written in Perl with a few bits in C and shell scripts for > system integration. > The compendium at > https://raptor.pccc.com/raptor.cgim?template=email_spam_compendium is helpful > to understand some of the concepts with SpamAssassin > It will be helpful for a student in this project to understand SMTP but a > willingness to learn and setup your own mail server on a Linux Distribution > with SpamAssassin for a personal test domain will be very desired with > assistance provided to get the basic framework for a sandbox for learning. > As email becomes more commodotized by major providers, knowledge of email > systems and their security is dwindling. This opportunity can provide > real-world experience with an email security product that is employed by > countless commercial systems in the world. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@community.apache.org For additional commands, e-mail: dev-h...@community.apache.org