Hi, all I'm not sure which mail list I should send my question to, sorry for any inconvenience I brought
I'm interested in that how hadoop handles the lost of intermediate data generated by map tasks currently, as some papers suggest, for the situation that the data needed by reducers are lost, we should compare the cost leading by redo the task and replicating the data, if redoing the task costs more, we can offer more replication of the intermediate data generated by map to ensure that reducers can access the data, otherwise, we just redo the corresponding map task when we detect the lost I'm not sure what's the strategy adopted by hadoop currently, I haven't find the code on this function, can anyone give me some suggestions? Thank you Nan