Re: a question about fault recovery

2014-03-04 Thread dachuan
Thanks for the clarification, so now I understand that only failed tasks will re-scheduled, and only the input partitions of these tasks will be re-computed. Another confusing point from paper is: "Because of these properties, D-Streams can parallelize recovery over hundreds of cores and recover

Re: a question about fault recovery

2014-03-04 Thread Reynold Xin
BlockManager is only responsible for in-memory/on-disk storage. It has nothing to do with re-computation. All the recomputation / retry code are done in the DAGScheduler. Note that when a node crashes, due to lazy evaluation, there is no task that needs to be re-run. Those tasks are re-run only wh

a question about fault recovery

2014-03-04 Thread dachuan
Hello, developers, I am just curious about the following two things which seems to be contradictory to each other, please help me find out my understanding mistakes: 1) Excerpted from sosp 2013 paper, "Then, when a node fails, the system detects all missing RDD partitions and launches tasks to re