Hi,
you might want to checkout:
http://katta.sourceforge.net/
Stefan
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Hadoop training and consulting
http://www.scaleunlimited.com
http://www.101tec.com
On Jun 1, 2009, at 9:54 AM, Tarandeep Singh wrote:
Hi All,
I am trying to build a distributed system to build and serve lucene
indexes.
I came across the Distributed Lucene project-
http://wiki.apache.org/hadoop/DistributedLucene
https://issues.apache.org/jira/browse/HADOOP-3394
and have a couple of questions. It will be really helpful if someone
can
provide some insights.
1) Is this code production ready?
2) Does someone has performance data for this project?
3) It allows searches and updates/deletes to be performed at the
same time.
How well the system will perform if there are frequent updates to the
system. Will it handle the search and update load easily or will it be
better to rebuild or update the indexes on different machines and then
deploy the indexes back to the machines that are serving the indexes?
Basically I am trying to choose between the 2 approaches-
1) Use Hadoop to build and/or update Lucene indexes and then deploy
them on
separate cluster that will take care or load balancing, fault
tolerance etc.
There is a package in Hadoop contrib that does this, so I can use
that code.
2) Use and/or modify the Distributed Lucene code.
I am expecting daily updates to our index so I am not sure if
Distribtued
Lucene code (which allows searches and updates on the same indexes)
will be
able to handle search and update load efficiently.
Any suggestions ?
Thanks,
Tarandeep