If not resilient at spark level, can't you just relaunch you job with your
orchestration tool ?

Le 21 déc. 2017 09:34, "Georg Heiler" <georg.kf.hei...@gmail.com> a écrit :

> Die you try to use the yarn Shuffle Service?
> chopinxb <chopi...@gmail.com> schrieb am Do. 21. Dez. 2017 um 04:43:
>
>> In my practice of spark application(almost Spark-SQL) , when there is a
>> complete node failure in my cluster, jobs which have shuffle blocks on the
>> node will completely fail after 4 task retries.  It seems that data
>> lineage
>> didn't work. What' more, our applications use multiple SQL statements for
>> data analysis. After a lengthy calculation, entire application failed
>> because of one job failure is unacceptable.  So we consider more stability
>> rather than speed in some way.
>>
>>
>>
>> --
>> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>>

Reply via email to