Re: Recovery when two spark nodes out of 6 fail

2021-06-25 Thread Lalwani, Jayesh
“Does this mean that only those tasks that the died executor was executing at the time need to be rerun to generate the processing stages. I read somewhere that RDD lineage keeps track of records of what needs to be re-executed.” It uses RDD lineage to figure out what needs to be re-executed. I

Re: Recovery when two spark nodes out of 6 fail

2021-06-25 Thread ashok34...@yahoo.com.INVALID
Thank you for detailed explanation. Please on below:   If one executor fails, it moves the processing over to other executor. However, if the data is lost, it re-executes the processing that generated the data, and might have to go back to the source. Does this mean that only those tasks th

Re: Recovery when two spark nodes out of 6 fail

2021-06-25 Thread Lalwani, Jayesh
Spark replicates the partitions among multiple nodes. If one executor fails, it moves the processing over to other executor. However, if the data is lost, it re-executes the processing that generated the data, and might have to go back to the source. In case of failure, there will be delay in g