Re: providing auxiliary data to Map

Todd Lipcon Wed, 27 May 2009 21:27:36 -0700

Hi Rares,
Check out the Distributed Cache: http://wiki.apache.org/hadoop/FAQ#8


Thanks
-Todd

On Wed, May 27, 2009 at 9:24 PM, Rares Vernica <[email protected]> wrote:

> Dear Hadoop Users,
>
> I am a newcomer into the Map-Reduce world. Please excuse my ignorance.
>
> I have two Map-Reduce phases. The first phase is the WordCount
> example. In the second phase, besides the regular input data, the Map
> function also needs the word-frequency table produced by the first
> phase.
>
> Obviously, the word-frequency table is small enough to fit into
> memory. Moreover, the first phase uses only one reduce, so that all
> the data is in one file in HDFS.
>
> My question is, what options do I have to efficiently get the
> word-frequency table to the map function of the second phase?
>
> One option is to access the HDFS form the map function and read the
> file produced by the first Map-Reduce phase. More exactly, I would
> read the file in the "setup" function. For this option, the machine
> that stores this file would become a bottleneck as when the second
> phase starts all the map instances will access that machine to get the
> file. Is there any way to overcome this bottleneck?
>
> Are there any other options?
>
> Thank you,
> Rares
>

Re: providing auxiliary data to Map

Reply via email to