Brian White <[EMAIL PROTECTED]> writes: > Wouldn't the Bayes tests be just the thing for these since it's already > adaptive?
Yes, but there's a difference between one good token and a surefire rule that catches a significant amount of spam. > What I can see happening, though, is spammers start using a "salt" so > that the entire string is effectively random. I think salts are better used in one-way hashes, not two-way obfuscation techniques systems, which is what this really is, but spammers are definitely likely to continue shifting to more complicated techniques. Here's a new revision of my eval test. I merged some code I had lying around to look for any rotation and also added initial tests for the name/citation and reverse ciphers mentioned on Yorkshire Dave's page. The rules still need some work and tuning. I unfortunately get some FPs for rotxx right now and reverse has horrible results because reversing English produces a lot of natural words. If you just want to search for rot13 strings, this command will search for rotated strings on standard input (note string argument at end of the command line). $ cat recent-spam | perl -e '$s = lc($ARGV[0]); for (0..25) { $rot{$s} = $_; $s =~ tr/a-z/b-za/; } $rx = join ("|", keys %rot); while(<STDIN>) { $c{lc($1)}++ if /($rx)/io} for (sort { $c{$b} <=> $c{$a} } keys %c) { printf "%10d rot-%02d %s\n", $c{$_}, $rot{$_}, $_; }' quinlan 11446 rot-00 quinlan 108 rot-13 dhvayna 95 rot-01 rvjombo 5 rot-03 txlqodq ------- start of cut text -------------- body T_EMAIL_ROT13_USER eval:check_for_email_transform('rot13', 'user', '4') body T_EMAIL_ROT13_HOST eval:check_for_email_transform('rot13', 'host', '4') body T_EMAIL_ROT13_BOTH eval:check_for_email_transform('rot13', 'full', '4') body T_EMAIL_ROT13_LOOSE eval:check_for_email_transform('rot13', 'loose', '4') body T_EMAIL_ROTXX_USER eval:check_for_email_transform('rotxx', 'user', '4') body T_EMAIL_ROTXX_HOST eval:check_for_email_transform('rotxx', 'host', '4') body T_EMAIL_ROTXX_BOTH eval:check_for_email_transform('rotxx', 'full', '4') body T_EMAIL_ROTXX_LOOSE eval:check_for_email_transform('rotxx', 'loose', '4') body T_EMAIL_CITE_USER eval:check_for_email_transform('cite', 'user', '4') body T_EMAIL_CITE_HOST eval:check_for_email_transform('cite', 'host', '4') body T_EMAIL_CITE_BOTH eval:check_for_email_transform('cite', 'full', '4') body T_EMAIL_CITE_LOOSE eval:check_for_email_transform('cite', 'loose', '4') body T_EMAIL_REVERSE_USER eval:check_for_email_transform('reverse', 'user', '4') body T_EMAIL_REVERSE_HOST eval:check_for_email_transform('reverse', 'host', '4') body T_EMAIL_REVERSE_BOTH eval:check_for_email_transform('reverse', 'full', '4') body T_EMAIL_REVERSE_LOOSE eval:check_for_email_transform('reverse', 'loose', '4') ------- end ---------------------------- ------- start of cut text -------------- sub check_for_email_transform { my ($self, $body, $transform, $target, $minimum) = @_; my %strings; my @addresses = $self->all_to_addrs(); return 0 unless @addresses; # handle increased random chance due to lots of addresses $minimum += int(log(scalar @addresses) / log(26)); my $expr; for my $to (@addresses) { my $user = $to; my $host = $to; $user =~ s/[EMAIL PROTECTED]//; $host =~ s/.*\@//; if ($target eq 'user') { $expr = $user; } elsif ($target eq 'host') { $expr = $host } elsif ($target eq 'full') { $expr = $to; } elsif ($target eq 'loose') { $expr = $to; if ($user) { my $count = () = ($user =~ m/([a-z])/gi); $expr = $user if $count >= $minimum; } } next unless $expr; my $count = () = ($expr =~ m/([a-z])/gi); next unless $count >= $minimum; $strings{$expr}++; } if (keys %strings) { my @strings = keys %strings; my $expr; if ($transform eq 'rot13') { @strings = map { s/[EMAIL PROTECTED]/.+/gs if /[A-Za-z]{$minimum}/; tr/a-mn-zA-MN-Z/n-za-mN-ZA-M/; quotemeta; } @strings; } if ($transform eq 'rotxx') { $minimum++; # another factor of 26 my @new; for my $string (@strings) { for (1..25) { $string =~ tr/a-zA-Z/b-zaB-ZA/; push(@new, $string); } } @strings = map { s/[EMAIL PROTECTED]/.+/gs if /[A-Za-z]{$minimum}/; quotemeta; } @new; } elsif ($transform eq 'cite') { @strings = map { tr/[EMAIL PROTECTED]/[EMAIL PROTECTED]/; quotemeta; } @strings; } elsif ($transform eq 'reverse') { @strings = grep { lc($_) ne lc(reverse $_); } @strings; @strings = map { s/[^A-Za-z]/./g; s/([EMAIL PROTECTED])/reverse $1/e; quotemeta; } @strings; @strings = grep { lc($_) ne lc(reverse $_); } @strings; # print STDERR join('%', @strings) . "\n"; } $expr = join('|', @strings); return 1 if $self->get('ALL') =~ /(?:$expr)/i; for my $line (@$body) { # print STDERR "match $1\n" if $line =~ /($expr)/i; return 1 if $line =~ /(?:$expr)/i; } } return 0; } ------- end ---------------------------- Daniel ------------------------------------------------------- This SF.Net email sponsored by: Free pre-built ASP.NET sites including Data Reports, E-commerce, Portals, and Forums are available now. Download today and enter to win an XBOX or Visual Studio .NET. http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet_072303_01/01 _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk