Wiebe Cazemier wrote: > On Thursday 04 May 2006 16:00, Magnus Holmgren wrote: > > > uri URI_NO_WWW_INFO_CGI /^(?:https?:\/\/)?[^\/]+(?<!\/www)\.[^.] > > {7,}\.info\/(?=\S{15,})\S*\?/i > > > > Let's see if I can get this straight... > > > > (?:https?:\/\/)? (optionally) "http://" or "https://" followed by > > [^\/]+ one or more of any characters except forward > > slash / (?<!\/www) of which the last part is not "/www", > > followed by \.[^.]{7,} a dot and at least 7 characters that > > are not dots, and \.info\/ ".info/" (?=\S{15,}) > > (which is followed by at least 15 non-space characters \S*\? > > (which we match again here, up to the first question mark which we > > add that there has to be.)) > > > > So it should match e.g. > > "foo.hellothere.info/forum/viewtopic.php?p=1 " as well as > > "www.hellothere.info/forum/viewtopic.php?p=1 " and > > "http://www.foo.hellothere.info/forum/viewtopic.php?p=1 " > > > > but not "http://www.hellothere.info/forum/viewtopic.php?p=1 ", > > "foo.hello.info/forum/viewtopic.php?p=1 ", > > "hellothere.info/forum/viewtopic.php?p=1 ", > > or "foo.hellothere.info/bar.php?p=1 ". > > (The following is also in reply to Bowie Bailey's message. BTW, > Bowie, your mailclient doesn't set a message reference, so threading > is messed up.)
Yea, I'm using an old version of Outlook which doesn't support threading. > The real URLs are (why I didn't post them before, I don't know...): > > http://studentwebzone.tc-online.info/forum2/viewtopic.php?p=47#47 > > http://studentwebzone.tc-online.info/forum2/viewtopic.php?t=17&unwatch=topic That makes more sense. These URLs will match the rule. > If all the rule does is check for uri's in a certain form, then I > would say that this specific rule can backfire on completely > legitimate mail. This message has a pretty high score. This indicates that there is very little ham in the masscheck corpus that matches this rule. Maybe you should submit some of your ham message to the corpus. This will help bring the score down the next time the default scores are recalculated. > Also, I know I can lookup the rules (in /usr/share/spamassassin) > myself, but I got very confused by all the regexps. I also didn't > know what to do with the regexp result, but I know now it should > simply check if it matches. Right, I was just pointing it out. This particular regex is a bit hairy. > > > I get false positive spam which have URI's in the .info TLD in > > > it. Like: > > > > > > http://foo.hello.info/forum/viewtopic.php?p=1 > > > > > > Does this rule mean that the webpage accessed by this URI is > > > different then the one accessed by: > > > > > > http://far.hello.info/forum/viewtopic.php?p=1 > > > > It just means that someone has seen much spam containing URI:s of > > the previous form and that the mass-checks confirmed it. > > I can't connect that to the description "URI: CGI in .info TLD other > than third-level "www"". It basically means that there is a .info URL with CGI (indicated by the question mark in the URL) which does not have "www." in the URL. At least, that seems to be the idea. In practice, it will match on any URL with a 7-character (or more) 2nd level domain name that is not of the form "www.domain.info". It also requires that there be a question mark somewhere in the URL. > > You can always lower the score of any rule you feel misfires. > > I'm trying to help the forum owner to avoid his reply notifcations > from being marked as spam, so what I do to my config is irrelevant. Unfortunately, the .info TLD appears to have gotten a bad reputation for spammer sites, so you're fighting an uphill battle here. -- Bowie