Yes, this is known for that one case. But the changes to the rule as I mentioned below do help it catch a lot more spam. (or at least shorten the regex)

Justin Mason wrote:
Ah.  Jo, sounds like your rule will need a little more work in
that case. ;)

--j.

Theo Van Dinter writes:
FWIW, I responded a few days ago with an explanation of why the rule isn't
hitting.  It has nothing to do with content-type headers and everything to do
with the fact that the message body isn't empty, there's HTML content.


On Thu, Aug 16, 2007 at 10:15:03AM +0100, Justin Mason wrote:
Jo --

I've checked that in as 'TVD_PDF_FINGER01_JO'.  You can track its progress
at http://ruleqa.SpamAssassin.org .

by the way -- it's pretty easy for you to test your own rules in your own
environment, actually, and I recommend you try it out.  These are the
tools we use:

  http://wiki.apache.org/spamassassin/MassCheck
  http://wiki.apache.org/spamassassin/HitFrequencies

They are bundled with SpamAssassin in the "masses" folder.  All the
documentation is there on the wiki.

--j.

Jo Rhett writes:
Since nobody is paying attention, let me clarify. The current rule is wrong:

mimeheader __TVD_MIME_ATT_AP    Content-Type =~ /^application\/pdf/i
mimeheader __TVD_MIME_ATT_AOPDF Content-Type =~ /^application\/octet-stream.*\.pdf/i

meta TVD_PDF_FINGER01 __TVD_MIME_CT_MM && __TVD_MIME_ATT_TP && __TVD_MIME_ATT && !__TVD_BODY

This evaluates to exactly the same as this:

meta TVD_PDF_FINGER01  __TVD_MIME_CT_MM && __TVD_MIME_ATT_TP && !__TVD_BODY

I believe that the original rule's intent was this:

meta TVD_PDF_FINGER01  __TVD_MIME_CT_MM && __TVD_MIME_ATT && !__TVD_BODY

Can someone with commit rights please test and commit this change?
Thank you.

Jo Rhett wrote:
Well actually I think the rule has a bug. Why OR the two mime types as a new meta, and then require one of the two in the final meta? The net effect is that if ATT_TP is true it matches, but if ATT_AOPDF is true it will never match.

I believe that the following will work better - work in every situation that it worked before, and not fail when the mime type is octet-stream: meta TVD_PDF_FINGER01 __TVD_MIME_CT_MM && __TVD_MIME_ATT && !__TVD_BODY

Would someone kindly evaluate this change and possibly fix the rule? Thanks.

On Aug 14, 2007, at 10:41 PM, Loren Wilton wrote:
rawbody __TVD_BODY              /\S{4}/
true

header __TVD_MIME_CT_MM         Content-Type =~ /^multipart\/mixed/i
true

mimeheader __TVD_MIME_ATT_AP    Content-Type =~ /^application\/pdf/i
false

mimeheader __TVD_MIME_ATT_AOPDF Content-Type =~ /^application\/octet-stream.*\.pdf/i
maybe true, maybe not. I would hope newlines were translated to spaces by the mimehdr plugin, but maybe they weren't. Try /is instead of /i and see if it helps.

meta __TVD_MIME_ATT __TVD_MIME_ATT_AP || __TVD_MIME_ATT_AOPDF
maybe true

meta TVD_PDF_FINGER01
   __TVD_MIME_CT_MM
true
   && __TVD_MIME_ATT_TP
undefined here, can't say
   && __TVD_MIME_ATT
maybe true
   && !__TVD_BODY
true

So, not knowing what is in __TVD_MIME_ATT_TP, I haven't a clue if it will fire, since that is part of an 'and'. If I assume it to be true then I'm still not sure because of the multiline possibility in __TVD_MIME_ATT.

       Loren

describe TVD_PDF_FINGER01 Mail matches standard pdf spam fingerprint

----- Original Message ----- From: "Jo Rhett" <[EMAIL PROTECTED]>
To: "SpamAssassin Users" <users@spamassassin.apache.org>
Sent: Tuesday, August 14, 2007 10:16 PM
Subject: Re: PDF rule not matching -- split line content type?


Can someone clue me in on why this rule isn't matching?

Jo Rhett wrote:
So I've been getting a metric ton of PDF spam. Investigating the rule that is supposed to match this, I see

rawbody __TVD_BODY              /\S{4}/
header __TVD_MIME_CT_MM         Content-Type =~ /^multipart\/mixed/i
meta __TVD_MIME_ATT __TVD_MIME_ATT_AP || __TVD_MIME_ATT_AOPDF meta TVD_PDF_FINGER01 __TVD_MIME_CT_MM && __TVD_MIME_ATT_TP && __TVD_MIME_ATT && !__TVD_BODY describe TVD_PDF_FINGER01 Mail matches standard pdf spam fingerprint

mimeheader __TVD_MIME_ATT_AP    Content-Type =~ /^application\/pdf/i
mimeheader __TVD_MIME_ATT_AOPDF Content-Type =~ /^application\/octet-stream.*\.pdf/i

The following message appears to match perfectly with this, except for perhaps that the content type is spread across two lines? I haven't checked the code, but would this matter?

Return-Path: <[EMAIL PROTECTED]>
Received: from mail.netconsonance.com ([unix socket])
     by triceratops.netconsonance.com (Cyrus v2.3.8) with LMTPA;
     Tue, 14 Aug 2007 06:27:16 -0700
Received: from [84.21.29.58] ([84.21.29.58])
by mail.netconsonance.com (8.14.1/8.14.1) with ESMTP id l7EDR4UU095951
    for <[EMAIL PROTECTED]>; Tue, 14 Aug 2007 06:27:08 -0700 (PDT)
    (envelope-from [EMAIL PROTECTED])
X-Virus-Scanned: amavisd-new at netconsonance.com
X-Spam-Score: 2.033
X-Spam-Level: **
X-Spam-Status: No, score=2.033 tagged_above=-999 required=4
    tests=[DK_POLICY_SIGNSOME=0.001, HTML_MESSAGE=0.001,
    MIME_HTML_MOSTLY=0.699, RCVD_IN_BL_SPAMCOP_NET=1.332]
Received: from x-6of7ca27m39al ([158.187.61.7]) by [84.21.29.58] with Microsoft SMTPSVC(6.0.3790.1830);
    Tue, 14 Aug 2007 15:27:01 +0200
Message-ID: <[EMAIL PROTECTED]>
From: "Yohann michels" <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Subject: bill-jrhett
Date: Tue, 14 Aug 2007 15:26:28 +0200
MIME-Version: 1.0
Content-Type: multipart/mixed;
    boundary="----=_NextPart_000_000E_01C7DE87.7C1E24D0"
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.3138
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3138


------=_NextPart_000_000E_01C7DE87.7C1E24D0
Content-Type: multipart/alternative;
    boundary="----=_NextPart_001_000F_01C7DE87.7C1E24D0"


------=_NextPart_001_000F_01C7DE87.7C1E24D0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
    charset=windows-1250


------=_NextPart_001_000F_01C7DE87.7C1E24D0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
    charset=windows-1250

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=3DContent-Type content=3D"text/html; =
charset=3Dwindows-1250">
<META content=3D"MSHTML 6.00.2900.3132" name=3DGENERATOR>
<STYLE></STYLE>
</HEAD>
<BODY bgColor=3D#ffffff>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV></BODY></HTML>

------=_NextPart_001_000F_01C7DE87.7C1E24D0--

------=_NextPart_000_000E_01C7DE87.7C1E24D0
Content-Transfer-Encoding: base64
Content-Type: application/octet-stream;
    name=marketing-jrhett.pdf
Content-Disposition: attachment;
    filename=marketing-jrhett.pdf

JVBERi0xLjUNJeLjz9MNCjIyIDAgb2JqPDwvSFs0MzYgMTQ4XS9MaW5lYXJpemVkIDEvRSAxNjU5 L0wgMTM1NzYvTiAxMC9PIDI2L1QgMTMwNzQ+Pg1lbmRvYmoNICAgICAgICAgICAgICAgICAgICAg *snip*



--
Jo Rhett
Net Consonance ... net philanthropy, open source and other randomness


--
Jo Rhett
Net Consonance ... net philanthropy, open source and other randomness
--
Randomly Selected Tagline:
"Premature optimisation is the root of all evil." - Knuth


--
Jo Rhett
Net Consonance ... net philanthropy, open source and other randomness

Reply via email to