Re: Distributed Lucene Questions

Devajyoti Sarkar Tue, 02 Jun 2009 19:53:44 -0700

Can Katta be used on an EC2 cluster?

The reason why I ask this is that it appears to use ZooKeeper which ideally
needs to have a dedicated drive. That may not be possible in a shared
environment. Is this a non-issue with respect to Katta?


I would appreciate any input in this regard.

Dev


On Wed, Jun 3, 2009 at 2:03 AM, Ted Dunning <[email protected]> wrote:

> Just a quick plug for Katta.  We use it extensively (and have been sending
> back some patches).
>
> See www.deepdyve.com for a test drive.
>
> At my previous job, we had utter fits working with SOLR using sharded
> retrieval.  Katta is designed to address the sharding problem very well and
> we have been very happy.  Our extensions have been to adapt Katta so that
> it
> is a general sharding and replication engine that supports general queries.
> For some things we use a modified Lucene, for other things, we use our own
> code.  Katta handles that really well.
>
> On Tue, Jun 2, 2009 at 9:23 AM, Tarandeep Singh <[email protected]>
> wrote:
>
> > thanks all for your replies. I am checking Katta...
> >
> > -Tarandeep
> >
> > On Tue, Jun 2, 2009 at 8:05 AM, Stefan Groschupf <[email protected]> wrote:
> >
> > > Hi,
> > > you might want to checkout:
> > > http://katta.sourceforge.net/
> > >
> > > Stefan
> > >
> > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > Hadoop training and consulting
> > > http://www.scaleunlimited.com
> > > http://www.101tec.com
> > >
> > >
> > >
> > >
> > > On Jun 1, 2009, at 9:54 AM, Tarandeep Singh wrote:
> > >
> > >  Hi All,
> > >>
> > >> I am trying to build a distributed system to build and serve lucene
> > >> indexes.
> > >> I came across the Distributed Lucene project-
> > >> http://wiki.apache.org/hadoop/DistributedLucene
> > >> https://issues.apache.org/jira/browse/HADOOP-3394
> > >>
> > >> and have a couple of questions. It will be really helpful if someone
> can
> > >> provide some insights.
> > >>
> > >> 1) Is this code production ready?
> > >> 2) Does someone has performance data for this project?
> > >> 3) It allows searches and updates/deletes to be performed at the same
> > >> time.
> > >> How well the system will perform if there are frequent updates to the
> > >> system. Will it handle the search and update load easily or will it be
> > >> better to rebuild or update the indexes on different machines and then
> > >> deploy the indexes back to the machines that are serving the indexes?
> > >>
> > >> Basically I am trying to choose between the 2 approaches-
> > >>
> > >> 1) Use Hadoop to build and/or update Lucene indexes and then deploy
> them
> > >> on
> > >> separate cluster that will take care or load balancing, fault
> tolerance
> > >> etc.
> > >> There is a package in Hadoop contrib that does this, so I can use that
> > >> code.
> > >>
> > >> 2) Use and/or modify the Distributed Lucene code.
> > >>
> > >> I am expecting daily updates to our index so I am not sure if
> > Distribtued
> > >> Lucene code (which allows searches and updates on the same indexes)
> will
> > >> be
> > >> able to handle search and update load efficiently.
> > >>
> > >> Any suggestions ?
> > >>
> > >> Thanks,
> > >> Tarandeep
> > >>
> > >
> > >
> >
>
>
>
> --
> Ted Dunning, CTO
> DeepDyve
>
> 111 West Evelyn Ave. Ste. 202
> Sunnyvale, CA 94086
> http://www.deepdyve.com
> 858-414-0013 (m)
> 408-773-0220 (fax)
>

Re: Distributed Lucene Questions

Reply via email to