Good evening, Justin,

On Mon, 17 Nov 2003, Justin Mason wrote:

> >>> Are there ways to improve the performance of the checks?  I ask
> >>> because these URI rules are tripping on about 50-60% of my current
> >>> spam - much more than the corresponding source domain blacklist rules.
> 
> Quick speed tips:
> 
>       .* = slow
>       lookaheads or lookbehinds = very slow

        Neither are used - *phew*!

>       anchoring with \b = fast

        OK, cool.  As I'm doing full domains, I'll change:
uri      WLS_URI_1 /0-go.org/i
        to
uri      WLS_URI_1 /\b0-go.org\b/i
        in the next version.

        Is there any way I could get SA to extract _just_ the host portion 
of the URI, unescape it, and lowercase it?  Then I could test my domains 
just against that, rather than the whole URI.  It would also mean I could 
remove the /i case insensitive search, which I'm sure isn't helping speed 
at all.  :-)
        Something like:
host     WLS_URI_1 /\b0-go.org\b/

        Oh, shoot.  Yahoo redirectors would screw that up.  *sigh*
        Perhaps we just put in a rules for the known redirectors.

>       anchoring with ^, $ = faster

        Tough to do in this case, although I know you were answering the 
general question of how to make regexes faster.

> >Possibility 2: bound the rules.  I noted that the URI for 16.com matched
> >significant ham.  Test for /\bdomain/ and maybe it'll run a trifle
> >faster.
> 
> yes.  If you can bound at the start of the URL it'll probably be
> faster still...

        As a general rule, I'm testing against domains; it's too easy for 
spammers use random hostnames more often than they do already.  If I try, 
won't I just end up with:

uri      WLS_URI_1 /^http:.*\b0-go.org\b/i

        which puts us right back at the .* problem again?
        Cheers,
        - Bill

---------------------------------------------------------------------------
        "Give a man a fish and you feed him for a day; give him a
freshly-charged electric eel and chances are he won't bother you for
anything ever again."
(Courtesy of Thomas Harris <[EMAIL PROTECTED]>)
--------------------------------------------------------------------------
William Stearns ([EMAIL PROTECTED]).  Mason, Buildkernel, freedups, p0f,
rsync-backup, ssh-keyinstall, dns-check, more at:   http://www.stearns.org
Linux articles at:                         http://www.opensourcedigest.com
--------------------------------------------------------------------------



-------------------------------------------------------
This SF. Net email is sponsored by: GoToMyPC
GoToMyPC is the fast, easy and secure way to access your computer from
any Web browser or wireless device. Click here to Try it Free!
https://www.gotomypc.com/tr/OSDN/AW/Q4_2003/t/g22lp?Target=mm/g22lp.tmpl
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to