Thanks for the ideas, although from my research it doesn't look like
googlebot or any other bot will accept gzip compression, it would be
nice though. I guess the only true way to tell if this is working will
be to check the access logs and what response codes were given back to
googlebot over time.

Thanks for your input

Shawn


-----Original Message-----
From: Josh Chamas [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, January 27, 2004 6:41 PM
To: Shawn
Cc: [EMAIL PROTECTED]
Subject: Re: Search Bot


Shawn wrote:
> Hi, I have been trying to figure out a way to limit the massive amount
> of bandwidth that search bots (Googlebot/2.1) consume daily from my 
> website. My problem is that I am running Apache::ASP and about 90% of 
> the site is dynamic content, links such as product.htm?id=100. The 
> dynamic content gets changed quite a bit so I don't want to use any 
> caching for regular users, but it would be fine for the bots to use a 
> cached copy for a month or so. The solution I came up with is manually

> modifying the headers to keeping sending back 304 HTTP_NOT_MODIFIED
for 
> a month before allowing new content to be served up to only search
bots 
> and not to regular web browsers. Can anyone tell me if there are some 
> problems you for see with doing something like this? I have only
tested 
> this on a dev server and was just wondering if anyone else had this 
> problem or any suggestions they might have.
> 

You could also try compressing your content with CompressGzip setting.
You can try setting the Expires header to one month in the future.
You could set a /robots.txt file to disallow Google from searching
a portion of your site that migth be excludable & high bandwidth.
You could sleep(N)seconds when Google does a request, I wonder if
that will slow their spiders down across their cluster(s).

Just ideas, I have not tried to throttle search bots before.

Oh, you might write your own custom mod_perl module that keeps track
of bandwidth for search bots and send a 503 "server busy" error code
if bandwidth is exceeded.  This might tell Google to back off for
a while (?).

Regards,

Josh
________________________________________________________________
Josh Chamas, Founder                   phone:925-552-0128
Chamas Enterprises Inc.                http://www.chamas.com
NodeWorks Link Checker                 http://www.nodeworks.com


-- 
Reporting bugs: http://perl.apache.org/bugs/
Mail list info: http://perl.apache.org/maillist/modperl.html
List etiquette: http://perl.apache.org/maillist/email-etiquette.html



-- 
Reporting bugs: http://perl.apache.org/bugs/
Mail list info: http://perl.apache.org/maillist/modperl.html
List etiquette: http://perl.apache.org/maillist/email-etiquette.html

Reply via email to