On Tue, 3 Nov 2020, Loren Wilton wrote:
I'm getting lots of spams that are about 100+K long. The spam body contains
two blocks of random news text copied from fox news or msnbc or the like,
enclosed in a zero-point font block. I'm trying to match this simple pattern
to give some extra points, but I can't seem to get it to work. I'm wondering
if there is some buffer limit in SA that is preventing the match from
working.
There is.
If I try
rawbody LONG_HIDDEN m'<font style="font-size:0px">[^<]*<'s
I don't get a match, even though I know there is a </font> about 50K into the
message.
The closing tag is past the end of the cutoff.
But if I try
rawbody LONG_HIDDEN m'<font style="font-size:0px">[^<]*'s
I do get a match. Note all I've done is remove the final "<" from the match
text.
If I try
rawbody LONG_HIDDEN m'<font style="font-size:0px">[^<]{990,}'s
I get a match.
That's what you should do. Don't try to cut it too close, though, as all
the spammer would need to do to bypass that is move the garbage block a
little further back in the message. I'd suggest {900} or even {500} - 500
characters of zero-point text in a message body is not plausibly
legitimate.
You don't need the "," - it doesn't matter what is there beyond your
cutoff, don't waste time matching it. Basic version:
rawbody LONG_HIDDEN m'<font style="font-size:0px">[^<]{500}'s
You may also want to stick optional whitespace in there to avoid trivial
bypass:
rawbody LONG_HIDDEN m'<font\s+style\s*=\s*"font-size:0px"\s*>[^<]{500}'s
There's also the possibility of adding a typeface or other options to the
<font> tag, which would bypass your simple rule. And HTML is not
case-sensitive. And avoid * on complex stuff when matching arbitrarily
long texts, which can lead to runaway backtracking and scan timeouts.
rawbody LONG_HIDDEN
m'<font\s[^>]{0,99}style\s*=\s*"font-size:0px"[^>]{0,99}>[^<]{500}'si
(Caveat: not tested, just off-the-cuff. There's room for improvement in
the style spec as well.)
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhar...@impsec.org pgpk -a jhar...@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
USMC Rules of Gunfighting #7: In ten years nobody will remember
the details of caliber, stance, or tactics. They will only remember
who lived.
-----------------------------------------------------------------------
Today: the Presidential Election