[EMAIL PROTECTED] (Justin Mason) writes: > We (Dan and I) were thinking that picking up the envelope-to and/or To: > addresses, and permuting those, would probably work pretty well to do > that.
I had a rule I was testing for this, but it's been lingering since 2.60 froze. I finished cleaning it up today. Dave, if you want to do some testing and work on it, here's my current version. It's basically ready for testing and further development, but waiting for the 2.60 branch to happen. History: (feel free to challenge any of these decisions as incorrect, of course) - minimum length: 4 vs. 5 did not make much difference, although the way minimum sequences are computed and handled needs some work -- the goal is to match as many encodings as possible without random false positives - rawbody vs. body vs. "pristine" body: not much difference - .* replacement helps a bit, but results in some FPs, the current replacement tries to avoid the worst ones - should also try testing headers, I'll leave that to you Random chance errors are a function of number of addresses, number of letters, diminishing returns, etc. A more accurate way to calculate the actual random change of a false positive for any rot13 string to be looked for would allow better skipping of insufficient strings. I think I got the calculation for scaling minimum vs. number of addresses correct, but it's been a while since I figured it out. :-) The tests: body T_ROT13_USER eval:check_for_rot13('user', '4') body T_ROT13_HOST eval:check_for_rot13('host', '4') body T_ROT13_BOTH eval:check_for_rot13('full', '4') body T_ROT13_LOOSE eval:check_for_rot13('loose', '4') The code: ------- start of cut text -------------- sub check_for_rot13 { my ($self, $body, $type, $minimum) = @_; my %strings; my @addresses = $self->all_to_addrs(); return 0 unless @addresses; # handle increased random chance due to lots of addresses $minimum += int(log(scalar @addresses) / log(26)); my $expr; for my $to (@addresses) { my $user = $to; my $host = $to; $user =~ s/[EMAIL PROTECTED]//; $host =~ s/.*\@//; if ($type eq 'user') { $expr = $user; } elsif ($type eq 'host') { $expr = $host } elsif ($type eq 'full') { $expr = $to; } elsif ($type eq 'loose') { $expr = $to; if ($user) { my $count = () = ($user =~ m/([a-z])/gi); $expr = $user if $count >= $minimum; } } next unless $expr; my $count = () = ($expr =~ m/([a-z])/gi); next unless $count >= $minimum; $strings{quotemeta(lc($expr))}++; } if (keys %strings) { $expr = join('|', keys %strings); $expr =~ s/(?<=[^|]{3})[EMAIL PROTECTED](?=[^|]{3}|[^|]{2}\b)/.*/gs; $expr =~ tr/a-mn-zA-MN-Z/n-za-mN-ZA-M/; for my $line (@$body) { return 1 if $line =~ /(?:$expr)/i; } } return 0; } ------- end ---------------------------- -- Daniel Quinlan anti-spam (SpamAssassin), Linux, and open http://www.pathname.com/~quinlan/ source consulting (looking for new work) ------------------------------------------------------- This SF.Net email sponsored by: Free pre-built ASP.NET sites including Data Reports, E-commerce, Portals, and Forums are available now. Download today and enter to win an XBOX or Visual Studio .NET. http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet_072303_01/01 _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk