Matt Kettler <[EMAIL PROTECTED]> writes: > Greg Troxel wrote: >> I see your point (the mail is malformed), but >> >> mail is multipart/alternative but only has text/html >> >> differs from >> >> mail is multipart/alternative and text/plain and text/html don't match >> >> are different conditions, and it might be useful to have different rules >> that could have different scores. >> > Well, they're different, but are they different enough to be worth > separate rules in a spam filtering context. > > i.e.: would the scores generated be significantly different, or are both > rules roughly the same strength of spam indicator, and end up with > more-or-less the same score. > > I suppose it could be tested, but this isn't a terribly common false > positive in the real world. If it were pervasive, I'd be jumping at > getting it fixed, but I've not seen it FP very often. The set3 S/O for > this rule was 0.969, which isn't perfect but it's really quite solid.
OK, I see your point - I realize there is finite time to write rules, and hadn't thought about the notion of directing that energy to rules with higher FP rates. I had been drinking the SA Kool-Aid a bit too much about lots of little rules and letting the score-assignment process sort them out... > In general it strikes me as a problem with email generated by some > homebrew tool that isn't working properly. If it was generated by a > common tool, we'd be seeing a lot of these, but a 3.1% false positive > rate for the whole rule (not just this case) is pretty low. > > I know it's a bit harsh, but I'm not terribly inclined to jump to change > SA to address rare-case false positives resulting from broken tools that > aren't in mainstream use and don't have a significant impact on SA's > global false positive rate. No worries, That's totally fair. My splitting suggestion was somewhat driven by curiousity - I have no idea which direction the scores of the split rules would move. > Unless someone knows of a mainstream FP case here, I'd be inclined to > suggest either fixing the generator (best option, as some mail clients > may barf on that output anyway), or locally zero the rule if you're the > 1 in a million people who gets badly malformed email on a regular basis. I'm not the one with the problem - I was just commenting that I thought the MPART_ALT_DIFF should not have fired on the message in question because the description says BODY: HTML and text parts are different rather than BODY: HTML and text parts are different or text part is missing It's certainly reasonable to consider a missing text part in multipart/alternative as non-matching.
pgpQFAlA1MSEt.pgp
Description: PGP signature