Hi, I have been trying to figure out a way to limit the massive amount of bandwidth that search bots (Googlebot/2.1) consume daily from my website. My problem is that I am running Apache::ASP and about 90% of the site is dynamic content, links such as product.htm?id=100. The dynamic content gets changed quite a bit so I don’t want to use any caching for regular users, but it would be fine for the bots to use a cached copy for a month or so. The solution I came up with is manually modifying the headers to keeping sending back 304 HTTP_NOT_MODIFIED for a month before allowing new content to be served up to only search bots and not to regular web browsers. Can anyone tell me if there are some problems you for see with doing something like this? I have only tested this on a dev server and was just wondering if anyone else had this problem or any suggestions they might have.
Thanks
<FilesMatch "\.htm$">
SetHandler perl-script
PerlModule Apache::ASP
PerlHeaderparserHandler Apache::SearchBot
ExpiresActive On
ExpiresDefault "access plus 1 second"
PerlHandler Apache::ASP
PerlSetVar Global /tmp
</FilesMatch>
package Apache::SearchBot;
use Apache::Constants ':http';
use Apache::File;
use Time::Piece;
use Time::Seconds;
sub handler {
my($r) = @_;
my %headers_in = $r->headers_in();
#first check if this is a bot, thats the only user we want to limit for
if($headers_in{'User-Agent'} =~ m/'Googlebot'/g){
#now check to see if they have a 'If-Modified-Since' date
if($headers_in{'If-Modified-Since'}){
#now we will just check if the bots date is more then a month old
my $time = localtime;
my $monthago = $time - 2678400; #31 days
my $botsdate = Time::Piece->strptime("$headers_in{'If-Modified-Since'}","%a, %d %b %Y %H:%M:%OS GMT");
if($botsdate > $monthago){
#the date he has isnt old enough yet so we will just give him a 304 and
#set the Last-Modified equal to the If-Modified-Since he checked for
$r->set_last_modified($botsdate->epoch);
$r->set_content_length;
my $str = $r->as_string();
return HTTP_NOT_MODIFIED;
}
else{
#this bot has a very old copy so we will allow him to grab the updated one
return OK;
}
}
else{
#this is the first time the bot has been here before, so just send him along
return OK;
}
}
}
1;