On 15/07/2005, at 3:57 PM, Otis Gospodnetic wrote:

The problem that I saw (from your email only) with the "ship the full
little index to the Queen" approach is that, from what I understand,
you eventually do addIndexes(Directory[]) in there, and as this
optimizes things in the end, this means your whole index gets
re-written to disk after each such call.


Yep, hence that I placed the partial index received from the worker in the queen's local disk, left there until such time as all the partial indexes had come in, and then do a final UberMerge of all of them in one hit.

As for MapReduce, from what I understand, it's quite a bit more
complicated under the hood, but very simple on the surface - given a
single big task, chop it up into a number of smaller ones, put them in
the massive, parallel system, and re-assemble them when they are done.


Is this sort of like the Fork-Join thing that Doug Lea talks about in his concurrency book? Anyway, the concept you mention is exactly the one I'm interested in. I'll have to hunt through the Nutch stuff to see. I guess it all depends if a problem can be easily and programmatically decomposed into smaller units.

I'm not sure how generic or Nutch-specific Doug and Mike's MapReduce
code is in Nutch, I haven't been paying close enough attention.


Me too.. :) I didn't even know Nutch was now fully in the ASF, and I'm a Member... :-$

Paul



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to