RE: URI_NO_WWW_INFO_CGI rule

Bowie Bailey Thu, 04 May 2006 13:49:34 -0700

Wiebe Cazemier wrote:
> On Thursday 04 May 2006 16:00, Magnus Holmgren wrote:
> 
> > uri URI_NO_WWW_INFO_CGI /^(?:https?:\/\/)?[^\/]+(?<!\/www)\.[^.]
> > {7,}\.info\/(?=\S{15,})\S*\?/i
> > 
> > Let's see if I can get this straight...
> > 
> > (?:https?:\/\/)?   (optionally) "http://"; or "https://"; followed by
> > [^\/]+             one or more of any characters except forward
> > slash / (?<!\/www)         of which the last part is not "/www",
> > followed by \.[^.]{7,}         a dot and at least 7 characters that
> > are not dots, and \.info\/           ".info/" (?=\S{15,})       
> > (which is followed by at least 15 non-space characters \S*\?       
> > (which we match again here, up to the first question mark which we
> > add that there has to be.)) 
> > 
> > So it should match e.g.
> > "foo.hellothere.info/forum/viewtopic.php?p=1 " as well as
> > "www.hellothere.info/forum/viewtopic.php?p=1 " and
> > "http://www.foo.hellothere.info/forum/viewtopic.php?p=1 " 
> > 
> > but not "http://www.hellothere.info/forum/viewtopic.php?p=1 ",
> > "foo.hello.info/forum/viewtopic.php?p=1 ",
> > "hellothere.info/forum/viewtopic.php?p=1 ",
> > or "foo.hellothere.info/bar.php?p=1 ".
> 
> (The following is also in reply to Bowie Bailey's message. BTW,
> Bowie, your mailclient doesn't set a message reference, so threading
> is messed up.)


Yea, I'm using an old version of Outlook which doesn't support
threading.

> The real URLs are (why I didn't post them before, I don't know...):
> 
> http://studentwebzone.tc-online.info/forum2/viewtopic.php?p=47#47
> 
>
http://studentwebzone.tc-online.info/forum2/viewtopic.php?t=17&unwatch=topic

That makes more sense.  These URLs will match the rule.

> If all the rule does is check for uri's in a certain form, then I
> would say that this specific rule can backfire on completely
> legitimate mail. 

This message has a pretty high score.  This indicates that there is
very little ham in the masscheck corpus that matches this rule.  Maybe
you should submit some of your ham message to the corpus.  This will
help bring the score down the next time the default scores are
recalculated.

> Also, I know I can lookup the rules (in /usr/share/spamassassin)
> myself, but I got very confused by all the regexps. I also didn't
> know what to do with the regexp result, but I know now it should
> simply check if it matches. 

Right, I was just pointing it out.  This particular regex is a bit
hairy.

> > > I get false positive spam which have URI's in the .info TLD in
> > > it. Like: 
> > > 
> > >         http://foo.hello.info/forum/viewtopic.php?p=1
> > > 
> > > Does this rule mean that the webpage accessed by this URI is
> > > different then the one accessed by: 
> > > 
> > >         http://far.hello.info/forum/viewtopic.php?p=1
> > 
> > It just means that someone has seen much spam containing URI:s of
> > the previous form and that the mass-checks confirmed it.
> 
> I can't connect that to the description "URI: CGI in .info TLD other
> than third-level "www"".

It basically means that there is a .info URL with CGI (indicated by
the question mark in the URL) which does not have "www." in the URL.
At least, that seems to be the idea.

In practice, it will match on any URL with a 7-character (or more) 2nd
level domain name that is not of the form "www.domain.info".  It also
requires that there be a question mark somewhere in the URL.

> > You can always lower the score of any rule you feel misfires.
> 
> I'm trying to help the forum owner to avoid his reply notifcations
> from being marked as spam, so what I do to my config is irrelevant.

Unfortunately, the .info TLD appears to have gotten a bad reputation
for spammer sites, so you're fighting an uphill battle here.

-- 
Bowie

RE: URI_NO_WWW_INFO_CGI rule

Reply via email to