Greetings, I am currently running a multi-user system in which mail is filtered using a centralized database of tokens. While I realize it is not the ideal solution for filtering, I am in the process of implementing a system that will allow users to submit Spam/Ham samples to their own separated database. For some this is the ideal solution, for others it represents a new level of complication that they would rather not deal with.
My question concerns recent reports of Spam that are appended with large messages. Some of these messages are movie reviews or other random articles which, I fear, may 'pollute' our token database in a way that makes it less effective. I am seeking recommendation on what to do with these messages; if I should allow these messages to be learned from, will there be any negative impact? Undoubtedly I don't have a firm enough grasp of how classification works; most notably, how the Bayesian aspect of SA decides which words are the most "interesting". However, this may not be relevant to the problem. Will the simple task of submitting enough Ham resolve this issue? While I continue to research the Web for an answer, I encourage anyone interested in this topic to comment. Thank you for helping me understand this situation better. Yours truly, Adam J. Henry ------------------------------------------------------- This SF.net email is sponsored by: SF.net Giveback Program. Does SourceForge.net help you be more productive? Does it help you create better code? SHARE THE LOVE, and help us help YOU! Click Here: http://sourceforge.net/donate/ _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk