Brian White <[EMAIL PROTECTED]> writes:

> Wouldn't the Bayes tests be just the thing for these since it's already
> adaptive?

Yes, but there's a difference between one good token and a surefire rule
that catches a significant amount of spam.
 
> What I can see happening, though, is spammers start using a "salt" so
> that the entire string is effectively random.

I think salts are better used in one-way hashes, not two-way obfuscation
techniques systems, which is what this really is, but spammers are
definitely likely to continue shifting to more complicated techniques.

Here's a new revision of my eval test.  I merged some code I had lying
around to look for any rotation and also added initial tests for the
name/citation and reverse ciphers mentioned on Yorkshire Dave's page.

The rules still need some work and tuning.

I unfortunately get some FPs for rotxx right now and reverse has
horrible results because reversing English produces a lot of natural
words.

If you just want to search for rot13 strings, this command will search
for rotated strings on standard input (note string argument at end of
the command line).

$ cat recent-spam | perl -e '$s = lc($ARGV[0]); for (0..25) { $rot{$s} = $_; $s =~ 
tr/a-z/b-za/; } $rx = join ("|", keys %rot); while(<STDIN>) { $c{lc($1)}++ if 
/($rx)/io} for (sort { $c{$b} <=> $c{$a} } keys %c) { printf "%10d rot-%02d %s\n", 
$c{$_}, $rot{$_}, $_; }' quinlan
     11446 rot-00 quinlan
       108 rot-13 dhvayna
        95 rot-01 rvjombo
         5 rot-03 txlqodq

------- start of cut text --------------
body T_EMAIL_ROT13_USER         eval:check_for_email_transform('rot13', 'user', '4')
body T_EMAIL_ROT13_HOST         eval:check_for_email_transform('rot13', 'host', '4')
body T_EMAIL_ROT13_BOTH         eval:check_for_email_transform('rot13', 'full', '4')
body T_EMAIL_ROT13_LOOSE        eval:check_for_email_transform('rot13', 'loose', '4')

body T_EMAIL_ROTXX_USER         eval:check_for_email_transform('rotxx', 'user', '4')
body T_EMAIL_ROTXX_HOST         eval:check_for_email_transform('rotxx', 'host', '4')
body T_EMAIL_ROTXX_BOTH         eval:check_for_email_transform('rotxx', 'full', '4')
body T_EMAIL_ROTXX_LOOSE        eval:check_for_email_transform('rotxx', 'loose', '4')

body T_EMAIL_CITE_USER          eval:check_for_email_transform('cite', 'user', '4')
body T_EMAIL_CITE_HOST          eval:check_for_email_transform('cite', 'host', '4')
body T_EMAIL_CITE_BOTH          eval:check_for_email_transform('cite', 'full', '4')
body T_EMAIL_CITE_LOOSE         eval:check_for_email_transform('cite', 'loose', '4')

body T_EMAIL_REVERSE_USER       eval:check_for_email_transform('reverse', 'user', '4')
body T_EMAIL_REVERSE_HOST       eval:check_for_email_transform('reverse', 'host', '4')
body T_EMAIL_REVERSE_BOTH       eval:check_for_email_transform('reverse', 'full', '4')
body T_EMAIL_REVERSE_LOOSE      eval:check_for_email_transform('reverse', 'loose', '4')
------- end ----------------------------

------- start of cut text --------------
sub check_for_email_transform {
  my ($self, $body, $transform, $target, $minimum) = @_;

  my %strings;
  my @addresses = $self->all_to_addrs();
  return 0 unless @addresses;

  # handle increased random chance due to lots of addresses
  $minimum += int(log(scalar @addresses) / log(26));

  my $expr;

  for my $to (@addresses) {
    my $user = $to;
    my $host = $to;
    $user =~ s/[EMAIL PROTECTED]//;
    $host =~ s/.*\@//;

    if ($target eq 'user') {
      $expr = $user;
    }
    elsif ($target eq 'host') {
      $expr = $host
    }
    elsif ($target eq 'full') {
      $expr = $to;
    }
    elsif ($target eq 'loose') {
      $expr = $to;
      if ($user) {
        my $count = () = ($user =~ m/([a-z])/gi);
        $expr = $user if $count >= $minimum;
      }
    }
    next unless $expr;
    my $count = () = ($expr =~ m/([a-z])/gi);
    next unless $count >= $minimum;

    $strings{$expr}++;
  }
  if (keys %strings) {
    my @strings = keys %strings;
    my $expr;

    if ($transform eq 'rot13') {
      @strings = map {
        s/[EMAIL PROTECTED]/.+/gs if /[A-Za-z]{$minimum}/;
        tr/a-mn-zA-MN-Z/n-za-mN-ZA-M/;
        quotemeta;
      } @strings;
    }
    if ($transform eq 'rotxx') {
      $minimum++;       # another factor of 26
      my @new;
      for my $string (@strings) {
        for (1..25) {
          $string =~ tr/a-zA-Z/b-zaB-ZA/;
          push(@new, $string);
        }
      }
      @strings = map {
        s/[EMAIL PROTECTED]/.+/gs if /[A-Za-z]{$minimum}/;
        quotemeta;
      } @new;
    }
    elsif ($transform eq 'cite') {
      @strings = map {
        tr/[EMAIL PROTECTED]/[EMAIL PROTECTED]/;
        quotemeta;
      } @strings;
    }
    elsif ($transform eq 'reverse') {
      @strings = grep { lc($_) ne lc(reverse $_); } @strings;
      @strings = map {
        s/[^A-Za-z]/./g;
        s/([EMAIL PROTECTED])/reverse $1/e;
        quotemeta;
      } @strings;
      @strings = grep { lc($_) ne lc(reverse $_); } @strings;
#      print STDERR join('%', @strings) . "\n";
    }

    $expr = join('|', @strings);
    return 1 if $self->get('ALL') =~ /(?:$expr)/i;
    for my $line (@$body) {
#      print STDERR "match $1\n" if $line =~ /($expr)/i;
      return 1 if $line =~ /(?:$expr)/i;
    }
  }
  return 0;
}
------- end ----------------------------

Daniel


-------------------------------------------------------
This SF.Net email sponsored by: Free pre-built ASP.NET sites including
Data Reports, E-commerce, Portals, and Forums are available now.
Download today and enter to win an XBOX or Visual Studio .NET.
http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet_072303_01/01
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to