Do we not have an option to store the map results in hdfs?
Billy
"Owen O'Malley" <[EMAIL PROTECTED]> wrote in
message news:[EMAIL PROTECTED]
It isn't optimal, but it is the expected behavior. In general when we
lose a TaskTracker, we want the map outputs regenerated so that any
reduces that need to re-run (including speculative execution). We could
handle it as a special case if:
1. We didn't lose any running reduces.
2. All of the reduces (including speculative tasks) are done with
shuffling.
3. We don't plan on launching any more speculative reduces.
If all 3 hold, we don't need to re-run the map tasks. Actually doing so,
would be a pretty involved patch to the JobTracker/Schedulers.
-- Owen