* Loren Wilton wrote (24/12/2005 00:23): >> Does anyone have any suggestions, apart from simply reducing the score >> for SARE_URI_EQUALS? Is this a spamassassin bug, or is there no way to >> guarantee that only real uris are parsed as such? > > Several.
Hi. Thanks for the response. I'm replying rather late due to pressures of Christmas. > > 1. Change your report generator to remove the extraneous dot between > updated and by. Or change it to the more common underscore, if you insist > on these words being connected for some reason. > > 2. Put spaces around the equal sign. These are fine suggestions, but sadly not practical. The e-mails are auto-generated diffs from cvs commits. The files being committed are java properties files. In particular, the "updated.by" property contains internationalised versions of the phrase "Updated by". The "more common underscore" would be unusual in the java properties file, and expecting the developers to change the way they work to avoid SARE misfires is a slightly overzealous reaction to the spam problem, I think. However, it is possible if there's no sensible alternative. The second suggestion is only a workaround, not a fix, anyway, because spamassassin will still check http://updated.by as a uri. > > 3. If you are reluctant for the correct fix, drop the score on the > uri_equals rule to 4 or maybe 3, depending on what else your report manages > to hit. I am reluctant to use the "correct fix". Actually I'm inclined to think that the word "correct" is being misapplied here. I've changed the scores appropriately, though. > > 4. You could submit a Bugzilla on the parsing of that phrase. But > frankly I consider the bug in the report generation, not SA's parsing of > strange syntax. The reason I didn't submit a bug was that I was not sure there was one - hence the original query. And I'm still not going to submit a bug, because I'm persuaded that there is not one. What bothered me (and still does a bit) was that the string "updated.by=anything" matches a rule that looks for uris of the form "http(s)://*=*". Ie the http(s) is conjured out of nowhere for schemeless uris. I can see the point, but I thought it would be worth bringing a possible problem to light. It's a possible problem, not a bug per se, and the subsequent discussion shows that people take different views on the seriousness of this kind of parsing issue. One thing that hasn't been mentioned in respect of this is that if spamassassin is looking aggressively for schemeless uris, it could in some cases create quite a lot of unwanted uri checking traffic. I'm happy to stick with what I've got now. I've sent some examples off as indicated so that the SARE corpus will contain my mail in future. Chris