yes, I finally find the corresponding codes it's in TaskTracker.MapOutputServelet, doGet()->sendMapFile()->TaskTracker.MapOutputLost()
it's true that the hadoop use redo strategy to solve this problem , but for some papers, it indicates that we can also replicate the intermediate result to make it fault-tolerance Thank you very much Nan On Wed, Oct 13, 2010 at 4:07 PM, newpant <newpant0...@gmail.com> wrote: > Hi, according to Hadoop The Definitive Guide , map will store the > intermediate output to a in-memory buffer first, and the spill it to local > disk which configured by mapred.local.dir, so from i knew, if the > intermediate data lost , only redo can fix it. > > if i wrong, please correct me. > > 2010/9/27 Nan Zhu <zhunans...@gmail.com> > > > Hi, all > > > > I'm not sure which mail list I should send my question to, sorry for any > > inconvenience I brought > > > > I'm interested in that how hadoop handles the lost of intermediate data > > generated by map tasks currently, as some papers suggest, for the > > situation > > that the data needed by reducers are lost, we should compare the cost > > leading by redo the task and replicating the data, if redoing the task > > costs > > more, we can offer more replication of the intermediate data generated by > > map to ensure that reducers can access the data, otherwise, we just redo > > the > > corresponding map task when we detect the lost > > > > I'm not sure what's the strategy adopted by hadoop currently, I haven't > > find > > the code on this function, can anyone give me some suggestions? > > > > Thank you > > > > Nan > > >