Hi all We are running several instances of Koha on the one box using Linux Vserver. The other night the server was brought to its knees and mysql ran out of free connections. Further investigation found over 80 instances of perl + Apache running OPAC search queries. There were many attendant instances of Zebra spawned as well, seeing to these searches.
Once our daily backup kicked in at the same time, all hades broke lose. We were DoS'd at one stage and had to remote reboot the thing. How did the web crawlers find our obscure site? Probably due to a URL containing a search being posted to a web site. After some thought, two of us came up with a simple solution to solve this situation. The Problem: the OPAC search and OPAC advanced searches are accessible by the public from the Koha OPAC home page. Consequently, an over zealous web crawler indexing the site using the opac-search.pl script can impact the performance of the Koha system. In the extreme, an under- resourced system can experience a DoS when the number of searches exceeds the capacity of the system. The Solution: modify the opac-search.pl script in the following manner: (A) Only allow queries using the POST method; otherwise if GET is used return a simple page with "No search result found". (B) Exception: do allow GET queries but only if the HTTP_REFERER matches the SERVER_NAME. This allows all the searches to work via web site links. Here is the small code segment added to opac-search.pl, immediately after the BEGIN block: if ($ENV{HTTP_REFERER} !~ /$ENV{SERVER_NAME}/ && $ENV{REQUEST_METHOD} ne "POST") { print "Content-type: text/html\n\n"; print "<h1>Search Results</h1>Nothing found.\n"; exit; } CAVEAT: This solution does not allow one to paste an "opac_search.pl" link into the browser and have it work as previously expected. But this was the cause of the problem in the first place. A better solution is to require a user to login to the OPAC before allowing a search. Addendum: also install a robots.txt file at the following location in the Koha source tree: opac/htdocs/robots.txt The robots.txt file should contain the following contents, which deny all access to indexing engines. You can learn more about robots.txt on the web, and configure it to allow some indexing if you wish. ----------------------------- User-agent: * Disallow: / ----------------------------- I plan to submit a bug report regarding this situation, but first open it up for discussion here. cheers rickw -- _________________________________ Rick Welykochy || Praxis Services If you have any trouble sounding condescending, find a Unix user to show you how it's done. -- Scott Adams _______________________________________________ Koha-devel mailing list Koha-devel@lists.koha.org http://lists.koha.org/mailman/listinfo/koha-devel