Re: Best Practices for Distributing Lucene Indexing and Searching

Paul Smith Thu, 14 Jul 2005 23:05:41 -0700


On 15/07/2005, at 3:57 PM, Otis Gospodnetic wrote:

The problem that I saw (from your email only) with the "ship the full
little index to the Queen" approach is that, from what I understand,
you eventually do addIndexes(Directory[]) in there, and as this
optimizes things in the end, this means your whole index gets
re-written to disk after each such call.

Yep, hence that I placed the partial index received from the workerin the queen's local disk, left there until such time as all thepartial indexes had come in, and then do a final UberMerge of all ofthem in one hit.

As for MapReduce, from what I understand, it's quite a bit more
complicated under the hood, but very simple on the surface - given a
single big task, chop it up into a number of smaller ones, put them in
the massive, parallel system, and re-assemble them when they are done.

Is this sort of like the Fork-Join thing that Doug Lea talks about inhis concurrency book? Anyway, the concept you mention is exactly theone I'm interested in. I'll have to hunt through the Nutch stuff tosee. I guess it all depends if a problem can be easily andprogrammatically decomposed into smaller units.

I'm not sure how generic or Nutch-specific Doug and Mike's MapReduce
code is in Nutch, I haven't been paying close enough attention.

Me too.. :) I didn't even know Nutch was now fully in the ASF, andI'm a Member... :-$


Paul



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Best Practices for Distributing Lucene Indexing and Searching

Reply via email to