Thanks for the help...it explains why the syntax was (as far as I could
tell) OK yet it still didn't work properly. I should have diffed the
output of my test script rather than relying on a visual grep late at
night :-).

On 26 May 1997, Roderick Schertler wrote:

> On Mon, 26 May 1997 23:40:35 +1000 (EST), Craig Sanders <[EMAIL PROTECTED]> 
> said:
> >
> > The database is a .db file created with 'makemap hash redir <redir'
> > from [:space:]-delimited source input like the following:
> [...]
> > //.*riddler.com/Commonwealth/bin/statdeploy.* //www.taz.net.au/blank_ad.gif
> 
> The problem is that makemap downcases the keys by default so
> Commonwealth is commonwealth in the map. Use the -f flag when building
> the map to disable this behavior.

The Answer!  Thanks!

> Since you're always scanning the db linearly, though, using a DB map
> isn't buying you anything.  I'd just read the patterns from the text
> file directly.

I'm using the hashed db for speed and convenience.

One advantage of the db file is that comments are stripped out by makemap
which means i can have as many comments as i like in the source text but
it wont slow down the script at all....quite important when on some of my
squid boxes this script has to do 50000+ lookups per hour.

Also, unless the tie function (or similar) can work with text files as
well as db files, there will also be the overhead of opening & closing the
file for every URL, plus the overhead of parsing each line into it's two
fields...

e.g. with one hundred entries in the file on a moderately busy machine
like the one above, that would be 50,000 open & close operations per hour
plus up to 5,000,000 line parsing operations (most URLs scanned WON'T
match any of the patterns so the loop will have to run to completion. very
rough calculations(*) from my squid log files indicate that around 10% of
URLs are banner advertisements) per hour. 

I could just read the text file into an array but that would mean i was
back where i started - having to restart squid when i make a change to the
database.  alternatively i could modify the script to respond to SIGHUP by
re-reading the text file.

(*) 'wc -l access.log'   vs   'grep blank_ad.gif access.log | wc -l'
    
    about 10% of the entries in the access.log over the last month were
    advertising banners redirected to blank_ad.gif by my script. this
    is on my lightly-used squid box at home where i do most of my web
    browsing in non-commerical linux & 'weirdness' related areas. I
    don't block advertising on my big squid at work, but I would guess
    that the proportion would be much higher. To tell the truth, I
    didn't mind banner ads until they started using FLASHING animated
    gifs - whoever invented gif animations should be drawn and quartered
    very slowly over a hot fire.


> Gratuitous unsolicited style tip #1:  Don't put semicolons after a
> closing brace except for do and eval blocks, and sub ref constructors.

a bad habit, i know.  it's easier to just put them in after every } rather
than have to remember the exceptions where they're required.


> Gratuitous unsolicited style tip #2:  This code would more idiomatically
> be
> 
>     print "$url==>" if $debug;
>     while (($key, $record) = each %redir_db) {
>       if ($url =~ s/$key/$record/) {
>           print $url;
>           last;
>       }
>     }
>     print "\n";

yes, that's much better.  thanks.  i knew there was a way of dropping out
of the loop quickly without using an ugly $found variable but couldn't
remember what it was.

craig

--
craig sanders
networking consultant                  Available for casual or contract
temporary autonomous zone              system administration tasks.


--
TO UNSUBSCRIBE FROM THIS MAILING LIST: e-mail the word "unsubscribe" to
[EMAIL PROTECTED] . 
Trouble?  e-mail to [EMAIL PROTECTED] .

Reply via email to