[EMAIL PROTECTED] (Justin Mason) writes:

> We (Dan and I) were thinking that picking up the envelope-to and/or To:
> addresses, and permuting those, would probably work pretty well to do
> that.

I had a rule I was testing for this, but it's been lingering since 2.60
froze.  I finished cleaning it up today.  Dave, if you want to do some
testing and work on it, here's my current version.  It's basically ready
for testing and further development, but waiting for the 2.60 branch to
happen.

History:

(feel free to challenge any of these decisions as incorrect, of course)

- minimum length: 4 vs. 5 did not make much difference, although the way
  minimum sequences are computed and handled needs some work -- the goal
  is to match as many encodings as possible without random false positives
- rawbody vs. body vs. "pristine" body: not much difference
- .* replacement helps a bit, but results in some FPs, the current
  replacement tries to avoid the worst ones
- should also try testing headers, I'll leave that to you

Random chance errors are a function of number of addresses, number of
letters, diminishing returns, etc.  A more accurate way to calculate the
actual random change of a false positive for any rot13 string to be
looked for would allow better skipping of insufficient strings.  I think
I got the calculation for scaling minimum vs. number of addresses
correct, but it's been a while since I figured it out.   :-)

The tests:

body T_ROT13_USER               eval:check_for_rot13('user', '4')
body T_ROT13_HOST               eval:check_for_rot13('host', '4')
body T_ROT13_BOTH               eval:check_for_rot13('full', '4')
body T_ROT13_LOOSE              eval:check_for_rot13('loose', '4')

The code:

------- start of cut text --------------
sub check_for_rot13 {
  my ($self, $body, $type, $minimum) = @_;

  my %strings;
  my @addresses = $self->all_to_addrs();
  return 0 unless @addresses;

  # handle increased random chance due to lots of addresses
  $minimum += int(log(scalar @addresses) / log(26));

  my $expr;

  for my $to (@addresses) {
    my $user = $to;
    my $host = $to;
    $user =~ s/[EMAIL PROTECTED]//;
    $host =~ s/.*\@//;

    if ($type eq 'user') {
      $expr = $user;
    }
    elsif ($type eq 'host') {
      $expr = $host
    }
    elsif ($type eq 'full') {
      $expr = $to;
    }
    elsif ($type eq 'loose') {
      $expr = $to;
      if ($user) {
        my $count = () = ($user =~ m/([a-z])/gi);
        $expr = $user if $count >= $minimum;
      }
    }
    next unless $expr;
    my $count = () = ($expr =~ m/([a-z])/gi);
    next unless $count >= $minimum;

    $strings{quotemeta(lc($expr))}++;
  }
  if (keys %strings) {
    $expr = join('|', keys %strings);

    $expr =~ s/(?<=[^|]{3})[EMAIL PROTECTED](?=[^|]{3}|[^|]{2}\b)/.*/gs;
    $expr =~ tr/a-mn-zA-MN-Z/n-za-mN-ZA-M/;

    for my $line (@$body) {
      return 1 if $line =~ /(?:$expr)/i;
    }
  }
  return 0;
}
------- end ----------------------------

-- 
Daniel Quinlan                     anti-spam (SpamAssassin), Linux, and open
http://www.pathname.com/~quinlan/   source consulting (looking for new work)


-------------------------------------------------------
This SF.Net email sponsored by: Free pre-built ASP.NET sites including
Data Reports, E-commerce, Portals, and Forums are available now.
Download today and enter to win an XBOX or Visual Studio .NET.
http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet_072303_01/01
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to