[SAtalk] Rules Modification /Content Filtering Results

Arlo Gilbert Mon, 01 Dec 2003 13:18:07 -0800

Hi Everybody,

This is a long post but i would REALLY appreciate your input, criticism, flaming, and what not.

I'm working on integrating spamassassin into our own spam filtering mechanism. Currently, with a score of 5 or greater we modify the subject line to indicate the spammishness of the message... with a score of 10 or greater we delete the email automatically and do not deliver it to the user because my experience has been that a score of 10+ is undoubtably spam. We use network, local and bayesian tests.

One of the concepts that i'm working on is a simple user preference. I'll explain it briefly and would REALLY like your input on the concept and any unexpected results i might encounter :)

The concept is pretty simple, basically spamassassin tries to identify UBE, but spamassassin is "content neutral" and doesn't know the end user. Here is the best example:

My grandmother. She recieves only emails from family. So for example if the standard spamassassin ruleset were implemented and somebody sent her UBE about free porn but the message did not score enough points to be "flagged", then the email would get through. This is spamassassin doing it's job appropriately.

However, what *I* know is that *ANY* message that spamassassin detects as being even possibly porn, regardless of the # of points it gets is spam, simply because my grand mother has no interest in any pornigraphic material.

The same goes for me for mortgages.. i dont have one and dont need one. so anything about mortgages, home financing etc is definitely spam for me.

Obviously adjusting the user threshhold is one way of doing it.. however i *DO* receive pornigraphic email and i like it (not really, but for the sake of example i do) and i subscribe to several mailing lists that promote hardcore pornography. I don't however want unsolicited porn.

My idea....

So my initial reaction to this solution is to allow per-user content filtering by doubling the points for each test. In the grandmother example, i have a list of about 20 tests, and i'm doubling the points earned for each of those tests so that any porn will be tagged/flagged as spam, but all other kinds of email will be filtered using the default rules.

For me, i double the points for all mortgage & loan related tests, but set all porn tests to 0 so i get all the porn emails i want but dont get any mortgage related email and all the non-porn UBE tests get run against porn emails, so any porn that is legitimate will get through and any porn that is ube will get tagged as such.

My Question/Concerns....

Does anybody have any ideas about how this would affect the bayesian learning system? If i suddenly changed my mind about liking porn one day and decided to block all porn, would I need to clear out my bayesian learning database so that it would begin to learn that i dont like porn?

Is there anything fundamentally flawed with my idea? Would this really mess up the filtering mechanism?

Thanks in advance for your thoughts!

-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
Does SourceForge.net help you be more productive?  Does it
help you create better code?  SHARE THE LOVE, and help us help
YOU!  Click Here: http://sourceforge.net/donate/
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

[SAtalk] Rules Modification /Content Filtering Results

Reply via email to