Hello, you could have each node build a separate index, and then merge the result back in a single consistent index using
org.apache.lucene.index.IndexWriter.addIndexes(Directory...) Regards, Sanne 2011/6/30 Guru Chandar <guru.chan...@consona.com>: > Thanks for the response. The documents are all distinct. My (limited) > understanding on partitioning the indexes will lead to results being > different from the case where you have all in one partition, due to Lucene > currently not supporting distributed idf. Is this correct? Is there a way to > make it work seamlessly? > > Regards, > -gc > > > -----Original Message----- > From: Danil ŢORIN [mailto:torin...@gmail.com] > Sent: Thursday, June 30, 2011 3:04 PM > To: java-user@lucene.apache.org > Subject: Re: distributing the indexing process > > It depends.... > > If all documents are distinct then, yeah, go for it. > > If you have multiple versions of same document in your data and you > only want to index the latest version...then you need a clever way to > split data to make sure that all versions of document will be indexed > on same host, and you won't have duplicates later. > > But my biggest concern is: if your index is that big that you need to > index it on different hosts, are you sure you want it to be combine in > a single index? > Maybe it's a good idea to partition it? > > On Thu, Jun 30, 2011 at 12:12, Guru Chandar <guru.chan...@consona.com> wrote: >> >> >> If we have to index a lot of documents, is there a way to divide the >> documents into multiple sets and index them on multiple machines in >> parallel, and then merge the resulting indexes back into a single >> machine? If yes, will the result be logically equivalent to indexing all >> the documents on a single machine? >> >> >> >> Thanks, >> >> -gc >> >> >> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org