Re: New Bayes like paradigm

2011-10-14 Thread darxus
On 10/13, Adam Katz wrote: > PS: As an SA Committer, do I have access to those logs? Don't think so, but you can just ask for a regular masscheck account if you don't already have one, and with that account do: rsync --exclude '*~' -vaz "rsync.spamassassin.org::corpus" ./ -- "I'd rather be hap

Re: New Bayes like paradigm

2011-10-13 Thread Adam Katz
> On 9/28/2011 8:02 AM, dar...@chaosreigns.com wrote: >> You definitely have a good point that it would only be necessary to >> track the combinations that actually show up in emails, however >> 1024 is only the possible combinations from one set of 10 rules. >> The number of combinations in the ac

Re: New Bayes like paradigm

2011-10-13 Thread Marc Perkel
On 10/10/2011 9:16 AM, dar...@chaosreigns.com wrote: On 10/10, Marc Perkel wrote: On 9/28/2011 8:02 AM, dar...@chaosreigns.com wrote: On 09/28, Marc Perkel wrote: You would only have to test the rule combinations that the message actually triggered. So if it hit 10 rules then it would be 102

Re: New Bayes like paradigm

2011-10-10 Thread darxus
On 10/10, Marc Perkel wrote: > On 9/28/2011 8:02 AM, dar...@chaosreigns.com wrote: > >On 09/28, Marc Perkel wrote: > >>You would only have to test the rule combinations that the message > >>actually triggered. So if it hit 10 rules then it would be 1024 > >>combinations. Seems not to be unreasonabl

Re: New Bayes like paradigm

2011-10-10 Thread Marc Perkel
On 9/28/2011 8:02 AM, dar...@chaosreigns.com wrote: On 09/28, Marc Perkel wrote: You would only have to test the rule combinations that the message actually triggered. So if it hit 10 rules then it would be 1024 combinations. Seems not to be unreasonable to me. You definitely have a good poin

Re: New Bayes like paradigm

2011-09-28 Thread darxus
On 09/28, dar...@chaosreigns.com wrote: > On 09/28, Marc Perkel wrote: > > You would only have to test the rule combinations that the message > > actually triggered. So if it hit 10 rules then it would be 1024 > > combinations. Seems not to be unreasonable to me. > combinations in the actual corpo

Re: New Bayes like paradigm

2011-09-28 Thread darxus
On 09/28, Marc Perkel wrote: > You would only have to test the rule combinations that the message > actually triggered. So if it hit 10 rules then it would be 1024 > combinations. Seems not to be unreasonable to me. You definitely have a good point that it would only be necessary to track the comb

Re: New Bayes like paradigm

2011-09-28 Thread Marc Perkel
On 9/27/2011 9:25 PM, dar...@chaosreigns.com wrote: On 09/27, Marc Perkel wrote: Here's the kind of think I'm seeing. Spam talks about money - low score. Spam talks about Jesus - low score. Spam talks about money and Jesus and throw in a dear someone and it's spam. I'm hoping to detect combina

Re: New Bayes like paradigm

2011-09-27 Thread darxus
Another possibility would be to generate meta rules from random sets of three rules. Some (actually random) examples: meta RANDOM_3_A = (MPART_ALT_DIFF && GAPPY_SUBJECT && URI_UNSUBSCRIBE) meta RANDOM_3_B = (RCVD_IN_MAPS_OPS && WEIRD_PORT && FSL_FAKE_GMAIL_RCVD) meta RANDOM_3_C = (FB_CAN_LONGER &

Re: New Bayes like paradigm

2011-09-27 Thread darxus
On 09/27, Marc Perkel wrote: > Here's the kind of think I'm seeing. Spam talks about money - low > score. Spam talks about Jesus - low score. Spam talks about money > and Jesus and throw in a dear someone and it's spam. I'm hoping to > detect combinations automatcally. You're not really talking ab

Re: New Bayes like paradigm

2011-09-27 Thread Marc Perkel
On 9/25/2011 5:37 PM, RW wrote: On Sun, 25 Sep 2011 09:28:32 -0700 Marc Perkel wrote: Here's what I'd like to be able to do. I'd like a program of some sort where I could take word tokes - like name of rules that were triggered - and look for rule combinations that indicate spam or ham. For e

Re: New Bayes like paradigm

2011-09-25 Thread RW
On Sun, 25 Sep 2011 09:28:32 -0700 Marc Perkel wrote: > Here's what I'd like to be able to do. I'd like a program of some > sort where I could take word tokes - like name of rules that were > triggered - and look for rule combinations that indicate spam or ham. > For example, a message triggers 4

Re: New Bayes like paradigm

2011-09-25 Thread Benny Pedersen
On Sun, 25 Sep 2011 09:28:32 -0700, Marc Perkel wrote: Hope you all understand what I'm saying here. How would someone do something like that? meta foo ((a + b + c + d) > x) where x is how many of the rules that need to hit then make __a __b __c __d body header what ever you like to scan for

Re: New Bayes like paradigm

2011-09-25 Thread David F. Skoll
On Sun, 25 Sep 2011 09:28:32 -0700 Marc Perkel wrote: > Each rule combo is then looked up for how often it occurs in spam and > how often it occurs in ham. Then the results are combined into some > sort of likelihood of being spam or ham. We looked at (and even implemented) some "meta-tokens" t

New Bayes like paradigm

2011-09-25 Thread Marc Perkel
Here's what I'd like to be able to do. I'd like a program of some sort where I could take word tokes - like name of rules that were triggered - and look for rule combinations that indicate spam or ham. For example, a message triggers 4 rules A B C and D. These rules are combined as follows: A