For multithreaded mapper, it can get more chances to combine the mapper output. Meanwhile, the locality of some global data will also be better. But the implementation in Hadoop 1.0.2 uses heavy synchronization, which brings much overhead. Are there any optimization about multithreaded mapper?
syscokid wrote: > > Why multithread the mapper? Just create more mappers. That way you spread > the data load as well as the mapping load potentially across multiple > nodes. > > > kenyh wrote: >> >> I wonder if there are any optimization about the multithread mapper to >> decrease the contention of input reading and output? >> > > -- View this message in context: http://old.nabble.com/MultithreadedMapper-tp34213805p34219011.html Sent from the Hadoop core-dev mailing list archive at Nabble.com.