I've gotten lots of spam that's only an attachment.  To detect this, I've 
written two rawbody eval subroutines.  One checks if the first part of a 
multi-part mail has any non-blank lines, and if it has none, it returns true; 
this is supposed to detect messages that are soley attachments with no actual 
message in them.  The second one detects multi-part mails with conent types 
other than "text/*" or "application/octet-stream"; a mail containing 
attachments other than these MIME types is likely not a spam (like 
"application/msword"), so this can be used to compensate for mails that are 
only a spread-sheet or whatnot.

Any critiques would be apreciated.

------------------------------

rawbody  ONLY_ATTACHMENTS       eval:check_for_only_attachments()
describe ONLY_ATTACHMNETS       Only attachmnets, no text

rawbody   GOOD_ATTACHMENTS      eval:check_for_non_spamish_attachments()
desrbibe  GOOD_ATTACHMENTS      Comensate for ONLY_ATTACHMENTS

score    ONLY_ATTACHMENTS       5.0
score    GOOD_ATTACHMENTS      -5.0

--------------------------

###########################################################################
# RAWBODY TESTS:
###########################################################################

sub check_for_non_spamish_attachments {
  my ($self, $body) = @_;

  my $ctype = $self->{msg}->get_header ('Content-Type');
  $ctype ||=  '';

  my $multipart_boundary;
  if ($ctype !~ /boundary=".*"/) {
    # Not multipart, so no attachments
    return 0;
  }

  my @ctypes = grep(/^Content-Type:/i, @$body);

  # Are there any content types other than "text/*" or
  # "application/octet-stream"?
  @ctypes = grep(!/^Content-Type: (text\/|application\/octet-stream)/,
                 @ctypes);

  return (@ctypes > 0);
}

sub check_for_only_attachments {
  my ($self, $body) = @_;

  my $ctype = $self->{msg}->get_header ('Content-Type');
  $ctype ||=  '';

  my $multipart_boundary;
  if ($ctype =~ /boundary="(.*)"/) {
    $multipart_boundary = "--$1\n";
  }
  else {
    # Not a multipart, no attachments
    return 0;
  }

  my $i;
  my $part_num = 0;

  for ($i = 0; $i < @$body; $i++) {
    my $line = $body->[$i];

    next if ($line =~ /^This is a multipart MIME message/);
    next if ($line =~ /^\s*$/);

    if ($line =~ /^$multipart_boundary$/) {
      # First part is the non-attachment message.  If we reach the
      # second part without finding a non-blank line, then the
      # first part of the mail was blank.
      $part_num++;
      return 1 if ($part_num > 1);

      if ($body->[$i + 1] =~ /^Content-Type:/) {
        $i++;
        $i++ while ($body->[$i + 1] =~ /^\s/);
      }

      $i++ while ($body->[$i + 1] =~
                  /^Content-(Transfer-Encoding|Disposition):/);

      next;
    } # if ($line =~ /^$multipart_boundary$/)

    # Found a non-blank line before the multi-part boundry
    return 0;
  } # for ($i = 0; $i < @$body; $i++)

  # Should we ever even get here?
  return 0;
}

_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to