On Mon, 2014-09-08 at 11:35 -0600, Amir Caspi wrote: > One of my spammy URI template rules is, for some reason, not hitting > any more. Spample here: > > http://pastebin.com/jy6WZhWW > > In my local.cf sandbox I have the following: > > uri __AC_STOPRANDDOM_URI1 > /(?:stop|halt|quit|leave|leavehere|out|exit|disallow|discontinue|end)\.[a-z0-9-]{10,}\.(?:us|me|com|club|org|net)\b/ > > This is part of my AC_SPAMMY_URI_PATTERNS meta rule, which hits just > fine on other emails (including others of this particular format). > > Debug output shows this subrule didn't hit anything (that is, the rule > isn't mentioned at all in the debug output), but regexpal.com says it > should have hit just fine.
Works for me. Pulled the sample from pastebin and fed to spamassassin -D with your custom rule added as additional configuration. That rule hits. > Could the problem be with the \b delimiter at the end? No. The word-boundary \b does not only match between a word \w and non-word \W char, but also at the beginning or end of the string, if the adjacent char is a word char. > I've noticed that sometimes can cause issues in failing to hit, but > usually only when a URI ends with a slash... That, too, would be unrelated to the \b word-boundary. What bothers me is that "sometimes" qualification. Either it matches or it doesn't. If it matches sometimes, something yet unnoticed has a severe impact. Did you grep the -D debug output for the hostname? Also try grepping for URIHOSTS (SA 3.4, without -L local only mode), which lists all hostnames found in the message. > and this same rule hits other matching URIs in other spams. However, > this isn't the first time I've noticed a failure to match... so any > idea why it's not hitting? Per the regex rules, it SHOULD be hitting > fine unless it's the \b... > > Any ideas? The URI is at the very end of a line with a CRLF delimiter following and the next line beginning with a word character. If you inject a space after the URI, does that make the rule match? (That should not be the issue, just trying to rule out conversion problems.) Also I noticed the headers are CRLF delimited, too. How did you get that sample? Any chance it has been modified or re-formatted by a text editor and does not equal the raw, original message? Does the pastebin uploaded file still not trigger the rule for you? -- char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1: (c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}