On Mon, 2014-09-08 at 11:35 -0600, Amir Caspi wrote:
> One of my spammy URI template rules is, for some reason, not hitting
> any more.  Spample here:
> 
> http://pastebin.com/jy6WZhWW
> 
> In my local.cf sandbox I have the following:
> 
> uri __AC_STOPRANDDOM_URI1     
> /(?:stop|halt|quit|leave|leavehere|out|exit|disallow|discontinue|end)\.[a-z0-9-]{10,}\.(?:us|me|com|club|org|net)\b/
> 
> This is part of my AC_SPAMMY_URI_PATTERNS meta rule, which hits just
> fine on other emails (including others of this particular format).
> 
> Debug output shows this subrule didn't hit anything (that is, the rule
> isn't mentioned at all in the debug output), but regexpal.com says it
> should have hit just fine.

Works for me.

Pulled the sample from pastebin and fed to spamassassin -D with your
custom rule added as additional configuration. That rule hits.


> Could the problem be with the \b delimiter at the end?

No. The word-boundary \b does not only match between a word \w and
non-word \W char, but also at the beginning or end of the string, if the
adjacent char is a word char.

> I've noticed that sometimes can cause issues in failing to hit, but
> usually only when a URI ends with a slash...

That, too, would be unrelated to the \b word-boundary.

What bothers me is that "sometimes" qualification. Either it matches or
it doesn't. If it matches sometimes, something yet unnoticed has a
severe impact.


Did you grep the -D debug output for the hostname? Also try grepping for
URIHOSTS (SA 3.4, without -L local only mode), which lists all hostnames
found in the message.


> and this same rule hits other matching URIs in other spams.  However,
> this isn't the first time I've noticed a failure to match... so any
> idea why it's not hitting?  Per the regex rules, it SHOULD be hitting
> fine unless it's the \b...
> 
> Any ideas?

The URI is at the very end of a line with a CRLF delimiter following and
the next line beginning with a word character. If you inject a space
after the URI, does that make the rule match? (That should not be the
issue, just trying to rule out conversion problems.)

Also I noticed the headers are CRLF delimited, too. How did you get that
sample? Any chance it has been modified or re-formatted by a text editor
and does not equal the raw, original message?

Does the pastebin uploaded file still not trigger the rule for you?


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

Reply via email to