Just a quick plug for Katta. We use it extensively (and have been sending back some patches).
See www.deepdyve.com for a test drive. At my previous job, we had utter fits working with SOLR using sharded retrieval. Katta is designed to address the sharding problem very well and we have been very happy. Our extensions have been to adapt Katta so that it is a general sharding and replication engine that supports general queries. For some things we use a modified Lucene, for other things, we use our own code. Katta handles that really well. On Tue, Jun 2, 2009 at 9:23 AM, Tarandeep Singh <[email protected]> wrote: > thanks all for your replies. I am checking Katta... > > -Tarandeep > > On Tue, Jun 2, 2009 at 8:05 AM, Stefan Groschupf <[email protected]> wrote: > > > Hi, > > you might want to checkout: > > http://katta.sourceforge.net/ > > > > Stefan > > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > Hadoop training and consulting > > http://www.scaleunlimited.com > > http://www.101tec.com > > > > > > > > > > On Jun 1, 2009, at 9:54 AM, Tarandeep Singh wrote: > > > > Hi All, > >> > >> I am trying to build a distributed system to build and serve lucene > >> indexes. > >> I came across the Distributed Lucene project- > >> http://wiki.apache.org/hadoop/DistributedLucene > >> https://issues.apache.org/jira/browse/HADOOP-3394 > >> > >> and have a couple of questions. It will be really helpful if someone can > >> provide some insights. > >> > >> 1) Is this code production ready? > >> 2) Does someone has performance data for this project? > >> 3) It allows searches and updates/deletes to be performed at the same > >> time. > >> How well the system will perform if there are frequent updates to the > >> system. Will it handle the search and update load easily or will it be > >> better to rebuild or update the indexes on different machines and then > >> deploy the indexes back to the machines that are serving the indexes? > >> > >> Basically I am trying to choose between the 2 approaches- > >> > >> 1) Use Hadoop to build and/or update Lucene indexes and then deploy them > >> on > >> separate cluster that will take care or load balancing, fault tolerance > >> etc. > >> There is a package in Hadoop contrib that does this, so I can use that > >> code. > >> > >> 2) Use and/or modify the Distributed Lucene code. > >> > >> I am expecting daily updates to our index so I am not sure if > Distribtued > >> Lucene code (which allows searches and updates on the same indexes) will > >> be > >> able to handle search and update load efficiently. > >> > >> Any suggestions ? > >> > >> Thanks, > >> Tarandeep > >> > > > > > -- Ted Dunning, CTO DeepDyve 111 West Evelyn Ave. Ste. 202 Sunnyvale, CA 94086 http://www.deepdyve.com 858-414-0013 (m) 408-773-0220 (fax)
