On 3 Aug 2016, at 6:07, Ruga wrote:

Hello,

We received a new type of spam, twice, and we are not willing to give them a third chance. The body includes a long html paragraph (<p>...</p>) of headlines from the news.

The following works at the command line:
perl -p0e 's/(<p>(?:(?!<\/p>).){999,}<\/p>)/-->$1<--/msig' example.eml
perl -n0e '/(<p>(?:(?!<\/p>).){999,}<\/p>)/msig and print "--->$1<---"' example.eml

The following SA rule, however, does not work at all:

rawbody __B_PLL /<p>(?:(?!<\/p>).){999,}<\/p>/msi

Will not hit an unclosed <p> tag.
May take a very long time to check against some very long messages with pathological (but not uncommon or spam-only!) HTML, due to the open-ended {999,}.

However, I happen to have a message that should match your rule. It matches in perl directly, with the matched string being a 163301-byte HTML mess which also happens to be one line when decoded from quoted-printable, but it does not match as a SA 'rawbody' pattern. It DOES match if the rule is switched from 'rawbody' to 'full'. It is not clear to me why that change results in a match. I also constructed a message where the '</p>' was only 650 characters after the '<p>' and reduced the minimum length from 999 to 300, and that also matched as a rawbody rule.

It seems that there's something breaking in SA when a 'rawbody' match is too long. I suspect a logical problem in how SA "chunks" a message for body and rawbody tests, but I haven't tracked down the details... Looks like a SA bug to me.

For performance, the ability to match unclosed paragraphs, and working around that bug,a better solution is:

rawbody __B_PLL /<p>(?:(?!<\/p>).){999}/msi

That will match the first 999 characters of a long HTML paragraph, no matter how long it is and whether or not it is ever closed. It also gets around whatever SA bug is blocking the very long matches.


tflags __B_PLL multiple maxhits=1

Pointless. Why set the "multiple" flag if you're going to set "maxhits=1"???

meta B_PLL __B_PLL
describe B_PLL Body: Paragraph Length Limit
score B_PLL 1.0

I assume this is a placeholder for future combination with other rules, since the 'meta' as it stands is pointless and simply having a paragraph longer than 999 characters isn't inherently or heuristically spammy.

Reply via email to