Thanks for the clarification, so now I understand that only failed tasks
will re-scheduled, and only the input partitions of these tasks will be
re-computed.
Another confusing point from paper is:
"Because of these properties, D-Streams can parallelize recovery over
hundreds of cores and recover
BlockManager is only responsible for in-memory/on-disk storage. It has
nothing to do with re-computation.
All the recomputation / retry code are done in the DAGScheduler. Note that
when a node crashes, due to lazy evaluation, there is no task that needs to
be re-run. Those tasks are re-run only wh
Hello, developers,
I am just curious about the following two things which seems to be
contradictory to each other, please help me find out my understanding
mistakes:
1) Excerpted from sosp 2013 paper, "Then, when a node fails, the system
detects all missing RDD partitions and launches tasks to re