Vivek Khera <[EMAIL PROTECTED]> writes:

> I've had exactly 1 message in three years for which this missing
> boundary was not a SPAM (which seems to be some mailing list software
> that did that botching on an attachment).  There doesn't seem to be a
> correlation with X-Mailer header, Does.
> either this sound like a good test to add?  I'm not sure how one would
> add it without modifying the MIME parser to detect it.

Sounds like a good test.  I also get that error in VM, so I'd be happy
if they were filtered out.  I was already playing with another "MIME
quality" test, so I tried your suggestion too.

Test set: 1789 total messages, 465 of those are spam.

  test                   description
  MIME_SUSPECT_NAME      MIME filename does not match MIME content type
  MIME_MISSING_BOUNDARY  missing final boundary

  test                   matches      spam    not-spam
  MIME_SUSPECT_NAME            8         7           1 (reply to a virus email)
  MIME_MISSING_BOUNDARY       36        36           0

That's pretty good.

Interestingly, almost all of the matched messages also match several
other rules:

  NO_REAL_NAME: 35 of 36
  MIME_ODD_CASE: 34 of 36
  BASE64_ENC_TEXT: 34 of 36

The NO_REAL_NAME is not too meaningful since it matches 423 messages
with only 244 being spam, but MIME_ODD_CASE matches 65 messages with
65 spam, so it almost feels like a small number of spam software
packages generating these particular spam messages.

I think the false positive for MIME_SUSPECT_NAME would be eliminated
with better MIME parsing code than my own.

Incidentally, "rawbody" seems to be misnamed since you don't seem to
get the raw body (meaning, 100% uncooked).  I had to use get_body() to
get the original unmodified body.

Another idea: what about a negative score for emails containing RFC
934 encapsulated messages?

  $ egrep -hi '^--* end.* -*-$' *[0-9] | count
  7       ------- End of forwarded message -------
  8       ------- end -------

  $ egrep -hi '^--* start.* -*-$' *[0-9] | count
  7       ------- Start of forwarded message -------
  8       ------- start of forwarded message (RFC 934 encapsulation) -------

Maybe not very common these days, but could be useful for digested
mailing lists.



Have big pipes? is looking for download mirrors. We supply
the hardware. You get the recognition. Email Us: [EMAIL PROTECTED]
Spamassassin-talk mailing list

Reply via email to