For multithreaded mapper, it can get more chances to combine the mapper
output. Meanwhile, the locality of some global data will also be better. But
the implementation in Hadoop 1.0.2 uses heavy synchronization, which brings
much overhead. Are there any optimization about multithreaded mapper?


syscokid wrote:
> 
> Why multithread the mapper? Just create more mappers. That way you spread
> the data load as well as the mapping load potentially across multiple
> nodes.
> 
> 
> kenyh wrote:
>> 
>> I wonder if there are any optimization about the multithread mapper to
>> decrease the contention of input reading and output? 
>> 
> 
> 

-- 
View this message in context: 
http://old.nabble.com/MultithreadedMapper-tp34213805p34219011.html
Sent from the Hadoop core-dev mailing list archive at Nabble.com.

Reply via email to