We've used Hadoop MapReduce with Solr to parallelize indexing for a customer
and that brought down their multi-hour indexing process down to a couple of
minutes. There is/was also Lucene-level contrib in Hadoop that makes use of
MapReduce to parallelize indexing.
Otis
Sematext :: http://
On Thu, 2011-06-30 at 11:45 +0200, Guru Chandar wrote:
> Thanks for the response. The documents are all distinct. My (limited)
> understanding on partitioning the indexes will lead to results being
> different from the case where you have all in one partition, due to
> Lucene currently not supp
-gc
>
>
> -Original Message-
> From: Danil ŢORIN [mailto:torin...@gmail.com]
> Sent: Thursday, June 30, 2011 3:04 PM
> To: java-user@lucene.apache.org
> Subject: Re: distributing the indexing process
>
> It depends
>
> If all documents are distinct then, y
it work
seamlessly?
Regards,
-gc
-Original Message-
From: Danil ŢORIN [mailto:torin...@gmail.com]
Sent: Thursday, June 30, 2011 3:04 PM
To: java-user@lucene.apache.org
Subject: Re: distributing the indexing process
It depends
If all documents are distinct then, yeah, go for it
It depends
If all documents are distinct then, yeah, go for it.
If you have multiple versions of same document in your data and you
only want to index the latest version...then you need a clever way to
split data to make sure that all versions of document will be indexed
on same host, and you