From: "Chris Lear" <[EMAIL PROTECTED]>

I'm getting false positives for SARE_URI_EQUALS, which scores 5 and is
therefore skewing the scoring of some mail quite badly.
The weird thing is that the uris that spamassassin is complaining about
aren't uris at all. The mail in question is auto-created reports of cvs
diffs, so it's slightly unusual.
I've tried to condense the debug information. Here it is:

This is some of the output from spamassassin -D <false_positive

[16733] dbg: uri: parsed uri found, updated.by=Mis
[16733] dbg: uri: cleaned parsed uri, http://updated.by=Mis
[16733] dbg: uri: cleaned parsed uri, updated.by=Mis
[16733] dbg: uri: parsed uri found, http://updated.by=Mis
[16733] dbg: uri: cleaned parsed uri, http://updated.by=Mis
[16733] dbg: uri: parsed uri found, updated.by=Updated
[16733] dbg: uri: cleaned parsed uri, updated.by=Updated
[16733] dbg: uri: cleaned parsed uri, http://updated.by=Updated
[16733] dbg: uri: parsed uri found, http://updated.by=Updated
[16733] dbg: uri: cleaned parsed uri, http://updated.by=Updated

These "parsed uris" are not links in the e-mail. They are just text.

I've had a bit of a look at the regexps that spamassassin uses to work
out what is a uri, and it seems that "updated.by=Updated" is treated as
a uri because .by is a valid tld and spamassassin looks for "schemeless"
uris, then prepends http:// for the tests.

I'm running spamassassin 3.1.0 on perl 5.8.2.

Does anyone have any suggestions, apart from simply reducing the score
for SARE_URI_EQUALS? Is this a spamassassin bug, or is there no way to
guarantee that only real uris are parsed as such?

Before you drop the score precipitously check if there is some other
characteristic of the emails that trigger falsely which can be used to
apply a negative score. If there is such a characteristic then generate
the appropriate negative score. If not weigh how effective the rule is
for you. The version of "sa-stats.pl" that is on the SARE site helps
figure this out nicely.

That said it's close to a "50/50" rule that hits on very few messages
here so should have a low score. (It hit on 6 messages out of 75000.)
Cutting it out completely here seems like it would be effective TODAY.
That could change. At one time it was quite necessary. Spammer fads
change.)

{^_^}

Reply via email to