Amir Caspi <ceph...@3phase.com> wrote on 01/30/2014 11:39:51 AM: > From: Amir Caspi <ceph...@3phase.com> > To: "Kevin A. McGrail" <kmcgr...@pccc.com>, > Cc: Andy Jezierski <ajezier...@stepan.com>, > "users@spamassassin.apache.org" <users@spamassassin.apache.org> > Date: 01/30/2014 11:40 AM > Subject: Re: Help with a regex to catch spam with gibberish html tags > > On Jan 30, 2014, at 10:28 AM, Kevin A. McGrail <kmcgr...@pccc.com> wrote: > > If you want to share the complete rule, I can throw it into my > sandbox and see what masscheck thinks as well. > > The complete rule would be something like this, assuming Andy > implemented it as I wrote it: > > rawbody HTML_NONSENSE_TAGS /(?:<[A-Za-z0-9]{4,}>\s*){10,}/ > describe HTML_NONSENSE_TAGS Many consecutive multi-letter HTML tags, > likely nonsense/spam > score HTML_NONSENSE_TAGS 0.001 > > Score to be adjusted as needed, of course. > > If one wants to be even more explicit, one could require that the > tags be prefaced with a <style> tag, although that should, > hopefully, get picked up by John Hardin's modifications to > STYLE_GIBBERISH sometime in the near future. > > Cheers. > > --- Amir
That would indeed be the rule I used. Andy