Hi Dan I might be stating the obvious here, but have you looked at Nutch ? Nutch uses Hadoop and is able to crawl, index and search (using Lucene). We've been using it for awhile and it works well.
Kind regards Steve Watt From: Dan Segel <danse...@gmail.com> To: common-dev@hadoop.apache.org Date: 11/11/2009 07:58 AM Subject: 5 billion pages indexed and searchable. I am looking to develop a search engine that will can handle 25 searches per second and have 5+ billion pages indexed. I intend to use the hardware below connected with fiber (of corse), do you think this is overkill, or am I falling way short. I plan I buying 32 servers (actually contained inside 2 blade servers), each configured as follows: http://www.eztradelive.com/product.php?productid=157&cat=45&page=1 > Dual 2.0GHz Quad Core CPU > 2 x 300GB 2.5" SAS HDD (RAID) > 16GB DDR2 RAM + 60 TB of storage at RAID 10 with fiber connection. Dan