http://www.SearchEngine-News.com

has lots of information on search engine optimization, how various bots
scrape,etc

On Wed, 2004-01-28 at 14:51, Ryan Farrington wrote:
> Stupid question: Have you tried contacting google to see if there is
> something they can do? Last I heard it was still a company ran by people who
> actually liked their jobs =)
> 
> -----Original Message-----
> From: Josh Chamas [mailto:[EMAIL PROTECTED] 
> Sent: Tuesday, January 27, 2004 7:41 PM
> To: Shawn
> Cc: [EMAIL PROTECTED]
> Subject: Re: Search Bot
> 
> 
> Shawn wrote:
> > Hi, I have been trying to figure out a way to limit the massive amount
> > of bandwidth that search bots (Googlebot/2.1) consume daily from my 
> > website. My problem is that I am running Apache::ASP and about 90% of 
> > the site is dynamic content, links such as product.htm?id=100. The 
> > dynamic content gets changed quite a bit so I don't want to use any 
> > caching for regular users, but it would be fine for the bots to use a 
> > cached copy for a month or so. The solution I came up with is manually 
> > modifying the headers to keeping sending back 304 HTTP_NOT_MODIFIED for 
> > a month before allowing new content to be served up to only search bots 
> > and not to regular web browsers. Can anyone tell me if there are some 
> > problems you for see with doing something like this? I have only tested 
> > this on a dev server and was just wondering if anyone else had this 
> > problem or any suggestions they might have.
> > 
> 
> You could also try compressing your content with CompressGzip setting.
> You can try setting the Expires header to one month in the future.
> You could set a /robots.txt file to disallow Google from searching
> a portion of your site that migth be excludable & high bandwidth.
> You could sleep(N)seconds when Google does a request, I wonder if
> that will slow their spiders down across their cluster(s).
> 
> Just ideas, I have not tried to throttle search bots before.
> 
> Oh, you might write your own custom mod_perl module that keeps track
> of bandwidth for search bots and send a 503 "server busy" error code
> if bandwidth is exceeded.  This might tell Google to back off for
> a while (?).
> 
> Regards,
> 
> Josh
> ________________________________________________________________
> Josh Chamas, Founder                   phone:925-552-0128
> Chamas Enterprises Inc.                http://www.chamas.com
> NodeWorks Link Checker                 http://www.nodeworks.com
> 
> 
> -- 
> Reporting bugs: http://perl.apache.org/bugs/
> Mail list info: http://perl.apache.org/maillist/modperl.html
> List etiquette: http://perl.apache.org/maillist/email-etiquette.html
-- 
Clayton Cottingham - WinterMarket Networks
Virtual Reality Programming, Design & Evangelist
Phone:(604) 875-1213
Cell: (604) 506-7230
Vancouver, B.C. Canada
[EMAIL PROTECTED]
http://www.wintermarket.net
IM's icq:154964789 hotmail:[EMAIL PROTECTED]
yahoo:[EMAIL PROTECTED]


-- 
Reporting bugs: http://perl.apache.org/bugs/
Mail list info: http://perl.apache.org/maillist/modperl.html
List etiquette: http://perl.apache.org/maillist/email-etiquette.html

Reply via email to