Re: Recovery when two spark nodes out of 6 fail

2021-06-25 Thread Lalwani, Jayesh
“Does this mean that only those tasks that the died executor was executing at the time need to be rerun to generate the processing stages. I read somewhere that RDD lineage keeps track of records of what needs to be re-executed.” It uses RDD lineage to figure out what needs to be re-executed. I

Re: Recovery when two spark nodes out of 6 fail

2021-06-25 Thread ashok34...@yahoo.com.INVALID
Thank you for detailed explanation. Please on below:   If one executor fails, it moves the processing over to other executor. However, if the data is lost, it re-executes the processing that generated the data, and might have to go back to the source. Does this mean that only those tasks th

Re: Recovery when two spark nodes out of 6 fail

2021-06-25 Thread Lalwani, Jayesh
Spark replicates the partitions among multiple nodes. If one executor fails, it moves the processing over to other executor. However, if the data is lost, it re-executes the processing that generated the data, and might have to go back to the source. In case of failure, there will be delay in g

Recovery when two spark nodes out of 6 fail

2021-06-25 Thread ashok34...@yahoo.com.INVALID
Greetings, This is a scenario that we need to come up with a comprehensive answers to fulfil please. If we have 6 spark VMs each running two executors via spark-submit. -  we have two VMs failures at H/W level, rack failure - we lose 4 executors of spark out of 12 - Happening half way

Re: [ANNOUNCE] Apache Spark 3.0.3 released

2021-06-25 Thread Dongjoon Hyun
Thank you, Yi! On Thu, Jun 24, 2021 at 10:52 PM Yi Wu wrote: > We are happy to announce the availability of Spark 3.0.3! > > Spark 3.0.3 is a maintenance release containing stability fixes. This > release is based on the branch-3.0 maintenance branch of Spark. We strongly > recommend all 3.0 u