Re: LONGWORDS not hitting?

2013-06-30 Thread Amir 'CG' Caspi
At 11:23 PM +0200 06/30/2013, Benny Pedersen wrote: does it continue if one msg is learned as spam, does it still after say bayes_50 ? No, it has BAYES_99 if I learn the message. That is, running SA on the SAME message will give BAYES_99 after it's learned. It's not a ham problem. you sho

Re: LONGWORDS not hitting?

2013-06-30 Thread RW
On Sun, 30 Jun 2013 23:01:10 +0200 Benny Pedersen wrote: > RW skrev den 2013-06-30 21:44: > > > I don't think Bayes tokenizes html. When I displayed it in claws > > mail (with the dillo plugin) I just saw 4 links. Bayes is just > > seeing the displayed texts from those links and some tokens from

Re: LONGWORDS not hitting?

2013-06-30 Thread Martin Gregorie
On Sun, 2013-06-30 at 20:44 +0100, RW wrote: > On Sun, 30 Jun 2013 12:42:53 -0600 > Amir 'CG' Caspi wrote: > > > Hi all, > > > > Just got this spam: > > > > http://pastebin.com/KM5paaZ9 > > > > > (And yes, I know it only hit BAYES_50... I really think these > > gibberish strings are confu

Re: LONGWORDS not hitting?

2013-06-30 Thread Benny Pedersen
Amir 'CG' Caspi skrev den 2013-06-30 23:09: very well. The actual spammy content is only 5% of the message (maybe less) and therefore doesn't "weigh" much in the Bayes analysis. very well it could be because it reduces the efficacy of learning these messages, per the description above. d

Re: LONGWORDS not hitting?

2013-06-30 Thread Amir 'CG' Caspi
At 8:57 PM +0200 06/30/2013, Benny Pedersen wrote: well it might confuse bayes yes, but it cant confuse you to run sa-learn --spam on it ? I've been running "sa-learn --spam" on these messages for a month straight. Some get picked up, others don't. I'm still getting a lot of BAYES_50 on the

Re: LONGWORDS not hitting?

2013-06-30 Thread Benny Pedersen
RW skrev den 2013-06-30 21:44: I don't think Bayes tokenizes html. When I displayed it in claws mail (with the dillo plugin) I just saw 4 links. Bayes is just seeing the displayed texts from those links and some tokens from the URIs. bayes digest it all, its just body that only see html part w

Re: LONGWORDS not hitting?

2013-06-30 Thread RW
On Sun, 30 Jun 2013 12:42:53 -0600 Amir 'CG' Caspi wrote: > Hi all, > > Just got this spam: > > http://pastebin.com/KM5paaZ9 > > (And yes, I know it only hit BAYES_50... I really think these > gibberish strings are confusing Bayes. I don't think Bayes tokenizes html. When I displayed

Re: LONGWORDS not hitting?

2013-06-30 Thread Benny Pedersen
Amir 'CG' Caspi skrev den 2013-06-30 20:42: (And yes, I know it only hit BAYES_50... I really think these gibberish strings are confusing Bayes. This is also another example of where an HTML COMMENT GIBBERISH rule would help. ;-) ) well it might confuse bayes yes, but it cant confuse you to r

LONGWORDS not hitting?

2013-06-30 Thread Amir 'CG' Caspi
Hi all, Just got this spam: http://pastebin.com/KM5paaZ9 To me, it looks like LONGWORDS should have hit... but it didn't. I ran it manually through spamassassin and spamc, and LONGWORDS still didn't hit, so it seems to just not be hitting that rule. But, to my eye, it looks like it