> On Jan 3, 2020, at 3:45 PM, Philip Prindeville
> <philipp_s...@redfish-solutions.com> wrote:
>
>
>
>> On Jan 2, 2020, at 4:08 PM, Philip Prindeville
>> <philipp_s...@redfish-solutions.com> wrote:
>>
>> I’m getting the following Spam.
>>
>> http://www.redfish-solutions.com/misc/bluechew.eml
>>
>> And this is notable for having:
>>
>> <meta http-equiv="Content-Type" content="text/html;
>> charset=iso-8859-1"><style>
>>
>> GUID1
>> GUID2
>> GUID3
>> GUID4
>> …
>> </style>
>
> One other question that occurs to me: why would we even need <meta
> http-equiv=“Content-Type” …> if we already have a Content-Type: header?
>
> Isn’t that the sign of a broken MUA doing the composition? Is that on its
> own Spamsign (with all respect to Frank Herbert)?
>
> -Philip
>
With that in mind, I’m trying out:
rawbody __L_UNNEEDED_META_CT /^<meta http-equiv="Content-Type" /m
meta T_BLOCK_MISC47 __L_UNNEEDED_META_CT
describe T_BLOCK_MISC47 Why do this when a Content-Type: header works?
score T_BLOCK_MISC47 20.0
And that seems to work.
I tried:
rawbody __L_STYLE_W_GUIDS
m/<style>\n\n([0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{3}-[0-9a-f]{3}-[0-9a-f]{12}\n){10,1000}<\/style>\n/s
and couldn’t get that to match… not sure why. A way to enable dumping what’s
matched the pattern buf would be handy.
But this does match:
rawbody __L_STYLE_W_GUIDS
m/<style>\n\n([0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{3}-[0-9a-f]{3}-[0-9a-f]{12}\n){10,1000}/s
so it’s hard to say how much of the message it’s matching up or why. Not sure
if the 2-4KB chunking is coming into play, since this is clearly longer than
that.
Any suggestions for debugging a rule to see what’s matching and what’s failing
to (and maybe why)?
-Philip