Vivek Khera <[EMAIL PROTECTED]> writes:

> I've had exactly 1 message in three years for which this missing
> boundary was not a SPAM (which seems to be some mailing list software
> that did that botching on an attachment).  There doesn't seem to be a
> correlation with X-Mailer header, Does.
> 
> either this sound like a good test to add?  I'm not sure how one would
> add it without modifying the MIME parser to detect it.

Sounds like a good test.  I also get that error in VM, so I'd be happy
if they were filtered out.  I was already playing with another "MIME
quality" test, so I tried your suggestion too.

Test set: 1789 total messages, 465 of those are spam.

  test                   description
  MIME_SUSPECT_NAME      MIME filename does not match MIME content type
  MIME_MISSING_BOUNDARY  missing final boundary

  test                   matches      spam    not-spam
  MIME_SUSPECT_NAME            8         7           1 (reply to a virus email)
  MIME_MISSING_BOUNDARY       36        36           0

That's pretty good.

Interestingly, almost all of the matched messages also match several
other rules:

  NO_REAL_NAME: 35 of 36
  MIME_ODD_CASE: 34 of 36
  BASE64_ENC_TEXT: 34 of 36

The NO_REAL_NAME is not too meaningful since it matches 423 messages
with only 244 being spam, but MIME_ODD_CASE matches 65 messages with
65 spam, so it almost feels like a small number of spam software
packages generating these particular spam messages.

I think the false positive for MIME_SUSPECT_NAME would be eliminated
with better MIME parsing code than my own.

Incidentally, "rawbody" seems to be misnamed since you don't seem to
get the raw body (meaning, 100% uncooked).  I had to use get_body() to
get the original unmodified body.

Another idea: what about a negative score for emails containing RFC
934 encapsulated messages?

  $ egrep -hi '^--* end.* -*-$' *[0-9] | count
  7       ------- End of forwarded message -------
  8       ------- end -------

  $ egrep -hi '^--* start.* -*-$' *[0-9] | count
  7       ------- Start of forwarded message -------
  8       ------- start of forwarded message (RFC 934 encapsulation) -------

Maybe not very common these days, but could be useful for digested
mailing lists.

Dan

_______________________________________________________________

Have big pipes? SourceForge.net is looking for download mirrors. We supply
the hardware. You get the recognition. Email Us: [EMAIL PROTECTED]
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to