Re: All PySpark jobs are canceled when one user cancel his PySpark paragraph (job)

Jhon Anderson Cardenas Diaz Tue, 12 Jun 2018 15:13:25 -0700

Hi!

We found the reason why this error is happening. It seems to be related
with the solution
<https://github.com/apache/zeppelin/commit/9f22db91c279b7daf6a13b2d805a874074b070fd>
for the task ZEPPELIN-2075
<https://issues.apache.org/jira/browse/ZEPPELIN-2075>.


This solution is causing that when one particular user cancels his py-spark
job, the py-spark jobs from all the users are being canceled.

When a py-spark job is cancelled, the method PySparkInterpreter interrupt()
is invoked, and then the SIGINT

context.py:

# create a signal handler which would be invoked on receiving SIGINT
def signal_handler(signal, frame):
    self.cancelAllJobs()
    raise KeyboardInterrupt()


2018-06-12 9:26 GMT-05:00 Jhon Anderson Cardenas Diaz <
[email protected]>:

> Hi!.
> I have 0.8.0 version, from September  2017
>
> 2018-06-12 4:48 GMT-05:00 Jianfeng (Jeff) Zhang <[email protected]>:
>
>>
>> Which version do you use ?
>>
>>
>> Best Regard,
>> Jeff Zhang
>>
>>
>> From: Jhon Anderson Cardenas Diaz <[email protected]<mailto:
>> [email protected]>>
>> Reply-To: "[email protected]<mailto:[email protected]>" <
>> [email protected]<mailto:[email protected]>>
>> Date: Friday, June 8, 2018 at 11:08 PM
>> To: "[email protected]<mailto:[email protected]>" <
>> [email protected]<mailto:[email protected]>>, "
>> [email protected]<mailto:[email protected]>" <
>> [email protected]<mailto:[email protected]>>
>> Subject: All PySpark jobs are canceled when one user cancel his PySpark
>> paragraph (job)
>>
>> Dear community,
>>
>> Currently we are having problems with multiple users running paragraphs
>> associated with pyspark jobs.
>>
>> The problem is that if an user aborts/cancels his pyspark paragraph
>> (job), the active pyspark jobs of the other users are canceled too.
>>
>> Going into detail, I've seen that when you cancel a user's job this
>> method is invoked (which is fine):
>>
>> sc.cancelJobGroup("zeppelin-[notebook-id]-[paragraph-id]")
>>
>> But somehow unknown to me, this method is also invoked:
>>
>> sc.cancelAllJobs()
>>
>> The above is due to the trace of the log that appears in the jobs of the
>> other users:
>>
>> Py4JJavaError: An error occurred while calling o885.count.
>> : org.apache.spark.SparkException: Job 461 cancelled as part of
>> cancellation of all jobs
>> at org.apache.spark.scheduler.DAGScheduler.org<http://org.apach
>> e.spark.scheduler.DAGScheduler.org>$apache$spark$scheduler$
>> DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1435)
>> at org.apache.spark.scheduler.DAGScheduler.handleJobCancellatio
>> n(DAGScheduler.scala:1375)
>> at org.apache.spark.scheduler.DAGScheduler$$anonfun$doCancelAll
>> Jobs$1.apply$mcVI$sp(DAGScheduler.scala:721)
>> at org.apache.spark.scheduler.DAGScheduler$$anonfun$doCancelAll
>> Jobs$1.apply(DAGScheduler.scala:721)
>> at org.apache.spark.scheduler.DAGScheduler$$anonfun$doCancelAll
>> Jobs$1.apply(DAGScheduler.scala:721)
>> at scala.collection.mutable.HashSet.foreach(HashSet.scala:78)
>> at org.apache.spark.scheduler.DAGScheduler.doCancelAllJobs(DAGS
>> cheduler.scala:721)
>> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOn
>> Receive(DAGScheduler.scala:1628)
>> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onRe
>> ceive(DAGScheduler.scala:1605)
>> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onRe
>> ceive(DAGScheduler.scala:1594)
>> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
>> at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:628)
>> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1925)
>> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1938)
>> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1951)
>> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1965)
>> at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:936)
>> at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperati
>> onScope.scala:151)
>> at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperati
>> onScope.scala:112)
>> at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
>> at org.apache.spark.rdd.RDD.collect(RDD.scala:935)
>> at org.apache.spark.sql.execution.SparkPlan.executeCollect(
>> SparkPlan.scala:275)
>> at org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$D
>> ataset$$execute$1$1.apply(Dataset.scala:2386)
>> at org.apache.spark.sql.execution.SQLExecution$.withNewExecutio
>> nId(SQLExecution.scala:57)
>> at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2788)
>> at org.apache.spark.sql.Dataset.org<http://org.apache.spark.sql
>> .Dataset.org>$apache$spark$sql$Dataset$$execute$1(Dataset.scala:2385)
>> at org.apache.spark.sql.Dataset.org<http://org.apache.spark.sql
>> .Dataset.org>$apache$spark$sql$Dataset$$collect(Dataset.scala:2392)
>> at org.apache.spark.sql.Dataset$$anonfun$count$1.apply(Dataset.
>> scala:2420)
>> at org.apache.spark.sql.Dataset$$anonfun$count$1.apply(Dataset.
>> scala:2419)
>> at org.apache.spark.sql.Dataset.withCallback(Dataset.scala:2801)
>> at org.apache.spark.sql.Dataset.count(Dataset.scala:2419)
>> at sun.reflect.GeneratedMethodAccessor120.invoke(Unknown Source)
>> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe
>> thodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:498)
>> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
>> at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
>> at py4j.Gateway.invoke(Gateway.java:280)
>> at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
>> at py4j.commands.CallCommand.execute(CallCommand.java:79)
>> at py4j.GatewayConnection.run(GatewayConnection.java:214)
>> at java.lang.Thread.run(Thread.java:748)
>>
>> (<class 'py4j.protocol.Py4JJavaError'>, Py4JJavaError('An error occurred
>> while calling o885.count.\n', JavaObject id=o886), <traceback object at
>> 0x7f9e669ae588>)
>>
>> Any idea of why this could be happening?
>>
>> (I have 0.8.0 version from September 2017)
>>
>> Thank you!
>>
>
>

Re: All PySpark jobs are canceled when one user cancel his PySpark paragraph (job)

Reply via email to