http://www.SearchEngine-News.com
has lots of information on search engine optimization, how various bots scrape,etc On Wed, 2004-01-28 at 14:51, Ryan Farrington wrote: > Stupid question: Have you tried contacting google to see if there is > something they can do? Last I heard it was still a company ran by people who > actually liked their jobs =) > > -----Original Message----- > From: Josh Chamas [mailto:[EMAIL PROTECTED] > Sent: Tuesday, January 27, 2004 7:41 PM > To: Shawn > Cc: [EMAIL PROTECTED] > Subject: Re: Search Bot > > > Shawn wrote: > > Hi, I have been trying to figure out a way to limit the massive amount > > of bandwidth that search bots (Googlebot/2.1) consume daily from my > > website. My problem is that I am running Apache::ASP and about 90% of > > the site is dynamic content, links such as product.htm?id=100. The > > dynamic content gets changed quite a bit so I don't want to use any > > caching for regular users, but it would be fine for the bots to use a > > cached copy for a month or so. The solution I came up with is manually > > modifying the headers to keeping sending back 304 HTTP_NOT_MODIFIED for > > a month before allowing new content to be served up to only search bots > > and not to regular web browsers. Can anyone tell me if there are some > > problems you for see with doing something like this? I have only tested > > this on a dev server and was just wondering if anyone else had this > > problem or any suggestions they might have. > > > > You could also try compressing your content with CompressGzip setting. > You can try setting the Expires header to one month in the future. > You could set a /robots.txt file to disallow Google from searching > a portion of your site that migth be excludable & high bandwidth. > You could sleep(N)seconds when Google does a request, I wonder if > that will slow their spiders down across their cluster(s). > > Just ideas, I have not tried to throttle search bots before. > > Oh, you might write your own custom mod_perl module that keeps track > of bandwidth for search bots and send a 503 "server busy" error code > if bandwidth is exceeded. This might tell Google to back off for > a while (?). > > Regards, > > Josh > ________________________________________________________________ > Josh Chamas, Founder phone:925-552-0128 > Chamas Enterprises Inc. http://www.chamas.com > NodeWorks Link Checker http://www.nodeworks.com > > > -- > Reporting bugs: http://perl.apache.org/bugs/ > Mail list info: http://perl.apache.org/maillist/modperl.html > List etiquette: http://perl.apache.org/maillist/email-etiquette.html -- Clayton Cottingham - WinterMarket Networks Virtual Reality Programming, Design & Evangelist Phone:(604) 875-1213 Cell: (604) 506-7230 Vancouver, B.C. Canada [EMAIL PROTECTED] http://www.wintermarket.net IM's icq:154964789 hotmail:[EMAIL PROTECTED] yahoo:[EMAIL PROTECTED] -- Reporting bugs: http://perl.apache.org/bugs/ Mail list info: http://perl.apache.org/maillist/modperl.html List etiquette: http://perl.apache.org/maillist/email-etiquette.html