* Loren Wilton wrote (24/12/2005 00:23):
>> Does anyone have any suggestions, apart from simply reducing the score
>> for SARE_URI_EQUALS? Is this a spamassassin bug, or is there no way to
>> guarantee that only real uris are parsed as such?
> 
> Several.

Hi. Thanks for the response. I'm replying rather late due to pressures
of Christmas.

> 
> 1.    Change your report generator to remove the extraneous dot between
> updated and by.  Or change it to the more common underscore, if you insist
> on these words being connected for some reason.
> 
> 2.    Put spaces around the equal sign.

These are fine suggestions, but sadly not practical. The e-mails are
auto-generated diffs from cvs commits. The files being committed are
java properties files. In particular, the "updated.by" property contains
internationalised versions of the phrase "Updated by". The "more common
underscore" would be unusual in the java properties file, and expecting
the developers to change the way they work to avoid SARE misfires is a
slightly overzealous reaction to the spam problem, I think. However, it
is possible if there's no sensible alternative.
The second suggestion is only a workaround, not a fix, anyway, because
spamassassin will still check http://updated.by as a uri.

> 
> 3.    If you are reluctant for the correct fix, drop the score on the
> uri_equals rule to 4 or maybe 3, depending on what else your report manages
> to hit.

I am reluctant to use the "correct fix". Actually I'm inclined to think
that the word "correct" is being misapplied here. I've changed the
scores appropriately, though.

> 
> 4.    You could submit a Bugzilla on the parsing of that phrase.  But
> frankly I consider the bug in the report generation, not SA's parsing of
> strange syntax.

The reason I didn't submit a bug was that I was not sure there was one -
hence the original query. And I'm still not going to submit a bug,
because I'm persuaded that there is not one. What bothered me (and still
does a bit) was that the string "updated.by=anything" matches a rule
that looks for uris of the form "http(s)://*=*". Ie the http(s) is
conjured out of nowhere for schemeless uris. I can see the point, but I
thought it would be worth bringing a possible problem to light. It's a
possible problem, not a bug per se, and the subsequent discussion shows
that people take different views on the seriousness of this kind of
parsing issue. One thing that hasn't been mentioned in respect of this
is that if spamassassin is looking aggressively for schemeless uris, it
could in some cases create quite a lot of unwanted uri checking traffic.

I'm happy to stick with what I've got now. I've sent some examples off
as indicated so that the SARE corpus will contain my mail in future.

Chris

Reply via email to