Matt Kettler <[EMAIL PROTECTED]> writes:

> Greg Troxel wrote:
>> I see your point (the mail is malformed), but
>>
>>    mail is multipart/alternative but only has text/html
>>
>> differs from
>>
>>   mail is multipart/alternative and text/plain and text/html don't match
>>
>> are different conditions, and it might be useful to have different rules
>> that could have different scores.
>>   
> Well, they're different, but are they different enough to be worth
> separate rules in a spam filtering context.
>
> i.e.: would the scores generated be significantly different, or are both
> rules roughly the same strength of spam indicator, and end up with
> more-or-less the same score.
>
> I suppose it could be tested, but this isn't a terribly common false
> positive in the real world. If it were pervasive, I'd be jumping at
> getting it fixed, but I've not seen it FP very often. The set3 S/O for
> this rule was 0.969, which isn't perfect but it's really quite solid.

OK, I see your point - I realize there is finite time to write rules,
and hadn't thought about the notion of directing that energy to rules
with higher FP rates.  I had been drinking the SA Kool-Aid a bit too
much about lots of little rules and letting the score-assignment process
sort them out...

> In general it strikes me as a problem with email generated by some
> homebrew tool that isn't working properly. If it was generated by a
> common tool, we'd be seeing a lot of these, but a 3.1% false positive
> rate for the whole rule (not just this case) is pretty low.
>
> I know it's a bit harsh, but I'm not terribly inclined to jump to change
> SA to address rare-case false positives resulting from broken tools that
> aren't in mainstream use and don't have a significant impact on SA's
> global false positive rate.

No worries, That's totally fair.  My splitting suggestion was somewhat
driven by curiousity - I have no idea which direction the scores of the
split rules would move.

> Unless someone knows of a mainstream FP case here, I'd be inclined to
> suggest either fixing the generator (best option, as some mail clients
> may barf on that output anyway), or locally zero the rule if you're the
> 1 in a million people who gets badly malformed email on a regular basis.

I'm not the one with the problem - I was just commenting that I thought
the MPART_ALT_DIFF should not have fired on the message in question
because the description says

  BODY: HTML and text parts are different

rather than

  BODY: HTML and text parts are different or text part is missing

It's certainly reasonable to consider a missing text part in
multipart/alternative as non-matching.

Attachment: pgpQFAlA1MSEt.pgp
Description: PGP signature

Reply via email to