Thanks for the clarification, so now I understand that only failed tasks
will re-scheduled, and only the input partitions of these tasks will be
re-computed.
Another confusing point from paper is:
"Because of these properties, D-Streams can parallelize recovery over
hundreds of cores and recover
BlockManager is only responsible for in-memory/on-disk storage. It has
nothing to do with re-computation.
All the recomputation / retry code are done in the DAGScheduler. Note that
when a node crashes, due to lazy evaluation, there is no task that needs to
be re-run. Those tasks are re-run only wh