Amir Caspi <ceph...@3phase.com> wrote on 01/30/2014 11:39:51 AM:

> From: Amir Caspi <ceph...@3phase.com>
> To: "Kevin A. McGrail" <kmcgr...@pccc.com>, 
> Cc: Andy Jezierski <ajezier...@stepan.com>, 
> "users@spamassassin.apache.org" <users@spamassassin.apache.org>
> Date: 01/30/2014 11:40 AM
> Subject: Re: Help with a regex to catch spam with gibberish html tags
> 
> On Jan 30, 2014, at 10:28 AM, Kevin A. McGrail <kmcgr...@pccc.com> 
wrote:
> 
> If you want to share the complete rule, I can throw it into my 
> sandbox and see what masscheck thinks as well.
> 
> The complete rule would be something like this, assuming Andy 
> implemented it as I wrote it:
> 
> rawbody HTML_NONSENSE_TAGS /(?:<[A-Za-z0-9]{4,}>\s*){10,}/
> describe HTML_NONSENSE_TAGS Many consecutive multi-letter HTML tags,
> likely nonsense/spam
> score HTML_NONSENSE_TAGS 0.001
> 
> Score to be adjusted as needed, of course.
> 
> If one wants to be even more explicit, one could require that the 
> tags be prefaced with a <style> tag, although that should, 
> hopefully, get picked up by John Hardin's modifications to 
> STYLE_GIBBERISH sometime in the near future.
> 
> Cheers.
> 
> --- Amir

That would indeed be the rule I used.

Andy

Reply via email to