Re: New spam rule for specific content

2013-08-11 Thread Amir 'CG' Caspi
At 8:23 PM -0700 08/11/2013, John Hardin wrote: However, I may be taking too-conservative a stance here. It's possible that, while HTML comments can appear in ham, *long* HTML comments won't, and the fact that we're looking for long blocks of comment text is enough safety. That's why feeling.

Re: New spam rule for specific content

2013-08-11 Thread John Hardin
On Sun, 11 Aug 2013, Amir 'CG' Caspi wrote: At 7:20 PM -0700 08/11/2013, John Hardin wrote: Yuck. Can you pastbin spamples, if you still have them? Here's one that comes to mind: http://pastebin.com/zVEH2h02 That's going to be problematic as the comment isn't gibberish, it's a bunch of pr

Re: New spam rule for specific content

2013-08-11 Thread Amir 'CG' Caspi
At 7:20 PM -0700 08/11/2013, John Hardin wrote: The unbounded matches you're using probably caused the RE engine to get stuck backing off and retrying. That's what I figured. That's why I changed things to the current version, which is "bounded" by the end-tag of the comment. My current ver

Re: New spam rule for specific content

2013-08-11 Thread John Hardin
On Sun, 11 Aug 2013, Amir 'CG' Caspi wrote: At 6:56 PM -0700 08/11/2013, John Hardin wrote: I'm also going to make FP-avoidance changes that should also help. Care to share? =) Everything is publicly visible in my sandbox: http://svn.apache.org/viewvc/spamassassin/trunk/rulesrc/sandbox/jhar

Re: New spam rule for specific content

2013-08-11 Thread John Hardin
On Sun, 11 Aug 2013, Amir 'CG' Caspi wrote: At 9:31 PM -0400 08/11/2013, Alex wrote: Are you using sqlgrey? If not, it's incredible and you should try it. I have not implemented any sort of greylisting yet. I can't use sqlgrey because I don't use postfix... my server runs sendmail. I'm sur

Re: New spam rule for specific content

2013-08-11 Thread Amir 'CG' Caspi
At 6:56 PM -0700 08/11/2013, John Hardin wrote: I'm also going to make FP-avoidance changes that should also help. Care to share? =) Just make sure that the rule does not match the --> comment-end token I tried doing that and it caused SA to hang... couldn't figure out why the regex wasn't

Re: New spam rule for specific content

2013-08-11 Thread John Hardin
On Sun, 11 Aug 2013, Amir 'CG' Caspi wrote: At 2:22 AM -0600 08/11/2013, Amir 'CG' Caspi wrote: My regex is valid and appropriate for those comments... I tested it at regexpal.com, which shows that all three comments match just fine (all three get highlighted). So... why is SA hitting only o

Re: New spam rule for specific content

2013-08-11 Thread Amir 'CG' Caspi
At 9:31 PM -0400 08/11/2013, Alex wrote: Can you post this rule again so we can investigate? # HTML comment gibberish # Looks for sequence of 100 or more "words" (alphanum + punct separated by whitespace) within HTML comment rawbody HTML_COMMENT_GIBBERISH //im describe HTML_COMMENT_GIBBERISH

Re: New spam rule for specific content

2013-08-11 Thread Alex
Hi, > Further confusion. Received another of these types of spam today: > > http://pastebin.com/YywcFkui > > My new HTML_COMMENT_GIBBERISH rule didn't hit on this one at all. Running Can you post this rule again so we can investigate? How do you find the SPAMMY_URI_PATTERNS rule is performing?

Re: New spam rule for specific content

2013-08-11 Thread Amir 'CG' Caspi
At 2:22 AM -0600 08/11/2013, Amir 'CG' Caspi wrote: My regex is valid and appropriate for those comments... I tested it at regexpal.com, which shows that all three comments match just fine (all three get highlighted). So... why is SA hitting only on the final comment, and ignoring the first tw

Re: New spam rule for specific content

2013-08-11 Thread Amir Caspi
On Aug 11, 2013, at 9:10 AM, Benny Pedersen wrote: > i created MSG_ID_INSTAFILE_BIZ and HTML_ERROR_TAGS_X_HTML , but even without > this rules its spam It is NOW, it was not when it was originally processed, as you can see from the SA headers included in the pastebin. If you read the messages

Re: New spam rule for specific content

2013-08-11 Thread Benny Pedersen
Amir 'CG' Caspi skrev den 2013-08-11 10:22: http://pastebin.com/VCtvzjzV Content analysis details: (10.9 points, 5.0 required) pts rule name description -- -- -0.0 RCVD_IN_MSPIKE_H3 RBL: Good repu

Re: New spam rule for specific content

2013-08-11 Thread Amir 'CG' Caspi
At 1:41 PM -0600 08/10/2013, Amir 'CG' Caspi wrote: (The HTML comment gibberish rule would be a big step here, since that's one of the few things that would distinguish this from ham... unlikely that a real person would embed tens of KB of comment gibberish.) OK, I'm trying to test an HTML co