Can Katta be used on an EC2 cluster? The reason why I ask this is that it appears to use ZooKeeper which ideally needs to have a dedicated drive. That may not be possible in a shared environment. Is this a non-issue with respect to Katta?
I would appreciate any input in this regard. Dev On Wed, Jun 3, 2009 at 2:03 AM, Ted Dunning <[email protected]> wrote: > Just a quick plug for Katta. We use it extensively (and have been sending > back some patches). > > See www.deepdyve.com for a test drive. > > At my previous job, we had utter fits working with SOLR using sharded > retrieval. Katta is designed to address the sharding problem very well and > we have been very happy. Our extensions have been to adapt Katta so that > it > is a general sharding and replication engine that supports general queries. > For some things we use a modified Lucene, for other things, we use our own > code. Katta handles that really well. > > On Tue, Jun 2, 2009 at 9:23 AM, Tarandeep Singh <[email protected]> > wrote: > > > thanks all for your replies. I am checking Katta... > > > > -Tarandeep > > > > On Tue, Jun 2, 2009 at 8:05 AM, Stefan Groschupf <[email protected]> wrote: > > > > > Hi, > > > you might want to checkout: > > > http://katta.sourceforge.net/ > > > > > > Stefan > > > > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > > Hadoop training and consulting > > > http://www.scaleunlimited.com > > > http://www.101tec.com > > > > > > > > > > > > > > > On Jun 1, 2009, at 9:54 AM, Tarandeep Singh wrote: > > > > > > Hi All, > > >> > > >> I am trying to build a distributed system to build and serve lucene > > >> indexes. > > >> I came across the Distributed Lucene project- > > >> http://wiki.apache.org/hadoop/DistributedLucene > > >> https://issues.apache.org/jira/browse/HADOOP-3394 > > >> > > >> and have a couple of questions. It will be really helpful if someone > can > > >> provide some insights. > > >> > > >> 1) Is this code production ready? > > >> 2) Does someone has performance data for this project? > > >> 3) It allows searches and updates/deletes to be performed at the same > > >> time. > > >> How well the system will perform if there are frequent updates to the > > >> system. Will it handle the search and update load easily or will it be > > >> better to rebuild or update the indexes on different machines and then > > >> deploy the indexes back to the machines that are serving the indexes? > > >> > > >> Basically I am trying to choose between the 2 approaches- > > >> > > >> 1) Use Hadoop to build and/or update Lucene indexes and then deploy > them > > >> on > > >> separate cluster that will take care or load balancing, fault > tolerance > > >> etc. > > >> There is a package in Hadoop contrib that does this, so I can use that > > >> code. > > >> > > >> 2) Use and/or modify the Distributed Lucene code. > > >> > > >> I am expecting daily updates to our index so I am not sure if > > Distribtued > > >> Lucene code (which allows searches and updates on the same indexes) > will > > >> be > > >> able to handle search and update load efficiently. > > >> > > >> Any suggestions ? > > >> > > >> Thanks, > > >> Tarandeep > > >> > > > > > > > > > > > > -- > Ted Dunning, CTO > DeepDyve > > 111 West Evelyn Ave. Ste. 202 > Sunnyvale, CA 94086 > http://www.deepdyve.com > 858-414-0013 (m) > 408-773-0220 (fax) >
