I'm getting false positives for SARE_URI_EQUALS, which scores 5 and is therefore skewing the scoring of some mail quite badly. The weird thing is that the uris that spamassassin is complaining about aren't uris at all. The mail in question is auto-created reports of cvs diffs, so it's slightly unusual. I've tried to condense the debug information. Here it is:
This is some of the output from spamassassin -D <false_positive [16733] dbg: uri: parsed uri found, updated.by=Mis [16733] dbg: uri: cleaned parsed uri, http://updated.by=Mis [16733] dbg: uri: cleaned parsed uri, updated.by=Mis [16733] dbg: uri: parsed uri found, http://updated.by=Mis [16733] dbg: uri: cleaned parsed uri, http://updated.by=Mis [16733] dbg: uri: parsed uri found, updated.by=Updated [16733] dbg: uri: cleaned parsed uri, updated.by=Updated [16733] dbg: uri: cleaned parsed uri, http://updated.by=Updated [16733] dbg: uri: parsed uri found, http://updated.by=Updated [16733] dbg: uri: cleaned parsed uri, http://updated.by=Updated These "parsed uris" are not links in the e-mail. They are just text. I've had a bit of a look at the regexps that spamassassin uses to work out what is a uri, and it seems that "updated.by=Updated" is treated as a uri because .by is a valid tld and spamassassin looks for "schemeless" uris, then prepends http:// for the tests. I'm running spamassassin 3.1.0 on perl 5.8.2. Does anyone have any suggestions, apart from simply reducing the score for SARE_URI_EQUALS? Is this a spamassassin bug, or is there no way to guarantee that only real uris are parsed as such? Chris