> -----Original Message-----
> From: jdow [mailto:[EMAIL PROTECTED]]
> Sent: Friday, September 22, 2006 7:41 AM
> To: users@spamassassin.apache.org
> Subject: Re: FP: URI_NOVOWEL
>
>
> From: "mouss" <[EMAIL PROTECTED]>
> > Chris Santerre wrote:
> >>
> >>> -----Original Message-----
> >>> From: mouss [mailto:[EMAIL PROTECTED]]
> >>> Sent: Wednesday, September 20, 2006 6:12 PM
> >>> To: SpamAssassin
> >>> Subject: Re: FP: URI_NOVOWEL
> >>>
> >>>
> >>> Theo Van Dinter wrote:
> >>>
> >>>> On Tue, Sep 19, 2006 at 10:58:46PM +0200, mouss wrote:
> >>>>
> >>>>
> >>>>> URI_NOVOWEL fires with things like href="" where id is
> >>>>>
> >>> a string that
> >>>
> >>>>> starts with 7 "no-vowel" chars.
> >>>>>
> >>>>> uri URI_NOVOWEL
> >>>>>
> >>> m%^https?://[^/?]*[bcdfghjklmnpqrstvwxz]{7}%i
> >>>
> >>>>> uri URI_NOVOWEL
> >>>>>
> >>> m%^https?://[^/?\#]*[bcdfghjklmnpqrstvwxz]{7}%i
> >>>
> >>>>> is this correct?
> >>>>>
> >>
> >> Well I changed the RE a bit for testing:
> >>
> >> uri URI_NOVOWEL /https?\:\/\/[^\/?\#]*[bcdfghjklmnpqrstvwxz]{7}/i
> >> describe URI_NOVOWEL testing for MOUSS
> >> score URI_NOVOWEL 0.75
> >>
> >> Initial tests show a few problems...(verified ham hits)
> >>
> >> http://www.phpwcms.....
> >> http://trkcnfrm......
> >> http://BlankBkgrd....
> >> http://SearchSQLS.....
> >> http://www.astdhpph.....
> >> http://libctxssl.......
> >> http://sccrmxc.......
> >> http://pluginsnppdf.......
> >>
> >> It does however have some potential,
> >> Spam hits: 400 Ham hits: 3 S/O: 0.889
> >> Spam hits: 1747 Ham hits: 6 S/O: 0.987
> >> Spam hits: 2754 Ham Hits: 4 S/O: 0.997
> >> Spam hits: 1976 Ham Hits: 15 S/O: 0.975
> >>
> >> Hope that helps a bit!
> >>
> >
> > Thanks chris, this is very helpful.
> >
> > So the test catches legitimate URIs. an immediate improvement is
> >
> > uri URI_NOVOWEL /https?\:\/\/[bcdfghjklmnpqrstvwxz]{7}/i
> >
> > because the goal is to catch URIs with random hostname
> parts. Of course,
> > they can still put that in the middle, aka www $dot random $dot
> > domain.example. but I didn't see that yet. An alternative
> is to disable
> > the rule?
>
> Why disable it? Simply give it a reasonable score value. It looks good
> enough to give it at least a 2.
I agree. The rule is good. It was interesting that across a wide variety of corpus, the same URL fp's came up. So I would score to taste. It looks as if a few tech sights would FP on it. But for the average user, these rules should catch a lot!
--Chris