Ok great. I understood the ideology, thanks.
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Yes.
If the job fails repeatedly (4 times in this case), Spark assumes that
there is a problem in the Job and notifies the user. In exchange for this,
the engine can go on to serve other jobs with its available resources.
I would try the following until things improve:
1. Figure out what's wrong
umm, i am not sure if I got this fully.
It is a design decision to not have context.stop() right after
awaitTermination throws exception?
So, the ideology is that if after n tries (default 4) a task fails, the
spark should fail fast and let user know? Is this correct?
As you mentioned there are
Correction: The Driver manages the Tasks, the resource manager serves up
resources to the Driver or Task.
On Tue, May 21, 2019 at 9:11 AM Jason Nerothin
wrote:
> The behavior is a deliberate design decision by the Spark team.
>
> If Spark were to "fail fast", it would prevent the system from rec
The behavior is a deliberate design decision by the Spark team.
If Spark were to "fail fast", it would prevent the system from recovering
from many classes of errors that are in principle recoverable (for example
if two otherwise unrelated jobs cause a garbage collection spike on the
same node). C
Ok, I found the reason.
In my QueueStream example, I have a while(true) which keeps on adding the
RDDs, my awaitTermination call if after the while loop. Since, the while
loop never exits, awaitTermination never gets fired and never get reported
the exceptions.
The above was just the problem wit
Just to add to my previous message.
I am using Spark 2.2.2 standalone cluster manager and deploying the jobs in
cluster mode.
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
-
To unsubscribe e-mail: user-unsu
I was able to reproduce the problem.
In the below repository, I have 2 sample jobs. Both are execution 1/0
(Arithmetic Exception) on the executor sides and but in case of
NetworkWordCount job, awaitTerminate throws the same exceptions (Job aborted
due to stage failure .) that I can see in the
Any help would be much appreciated.
The error and question is quite generic, i believe that most experienced
users will be able to answer.
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
-
To unsubscribe e-
>> Code would be very helpful,
I will try to put together something to post here.
>> 1. Writing in Java
I am using Scala
>> Wrapping the entire app in a try/catch
Once the SparkContext object is created, a Future is started where actions
and transformations are defined and streaming context is s
Code would be very helpful, but it *seems like* you are:
1. Writing in Java
2. Wrapping the *entire app *in a try/catch
3. Executing in local mode
The code that is throwing the exceptions is not executed locally in the
driver process. Spark is executing the failing code on the cluster.
On Sun, M
Hi,
Anyone? This should be a straight forward one :)
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
12 matches
Mail list logo