Hi! We found the reason why this error is happening. It seems to be related with the solution <https://github.com/apache/zeppelin/commit/9f22db91c279b7daf6a13b2d805a874074b070fd> for the task ZEPPELIN-2075 <https://issues.apache.org/jira/browse/ZEPPELIN-2075>.
This solution is causing that when one particular user cancels his py-spark job, the py-spark jobs from *all the users are being canceled !!*. When a py-spark job is cancelled, the method PySparkInterpreter interrupt() is invoked, and then the SIGINT event is called, causing that all the jobs in the same spark context be cancelled: context.py: # create a signal handler which would be invoked on receiving SIGINT def signal_handler(signal, frame): *self.cancelAllJobs()* raise KeyboardInterrupt() Is this a zeppelin bug ? Thank you. 2018-06-12 17:12 GMT-05:00 Jhon Anderson Cardenas Diaz < jhonderson2...@gmail.com>: > Hi! > > We found the reason why this error is happening. It seems to be related > with the solution > <https://github.com/apache/zeppelin/commit/9f22db91c279b7daf6a13b2d805a874074b070fd> > for the task ZEPPELIN-2075 > <https://issues.apache.org/jira/browse/ZEPPELIN-2075>. > > This solution is causing that when one particular user cancels his > py-spark job, the py-spark jobs from all the users are being canceled. > > When a py-spark job is cancelled, the method PySparkInterpreter > interrupt() is invoked, and then the SIGINT > > context.py: > > # create a signal handler which would be invoked on receiving SIGINT > def signal_handler(signal, frame): > self.cancelAllJobs() > raise KeyboardInterrupt() > > > 2018-06-12 9:26 GMT-05:00 Jhon Anderson Cardenas Diaz < > jhonderson2...@gmail.com>: > >> Hi!. >> I have 0.8.0 version, from September 2017 >> >> 2018-06-12 4:48 GMT-05:00 Jianfeng (Jeff) Zhang <jzh...@hortonworks.com>: >> >>> >>> Which version do you use ? >>> >>> >>> Best Regard, >>> Jeff Zhang >>> >>> >>> From: Jhon Anderson Cardenas Diaz <jhonderson2...@gmail.com<mailto: >>> jhonderson2...@gmail.com>> >>> Reply-To: "us...@zeppelin.apache.org<mailto:us...@zeppelin.apache.org>" >>> <us...@zeppelin.apache.org<mailto:us...@zeppelin.apache.org>> >>> Date: Friday, June 8, 2018 at 11:08 PM >>> To: "us...@zeppelin.apache.org<mailto:us...@zeppelin.apache.org>" < >>> us...@zeppelin.apache.org<mailto:us...@zeppelin.apache.org>>, " >>> dev@zeppelin.apache.org<mailto:dev@zeppelin.apache.org>" < >>> dev@zeppelin.apache.org<mailto:dev@zeppelin.apache.org>> >>> Subject: All PySpark jobs are canceled when one user cancel his PySpark >>> paragraph (job) >>> >>> Dear community, >>> >>> Currently we are having problems with multiple users running paragraphs >>> associated with pyspark jobs. >>> >>> The problem is that if an user aborts/cancels his pyspark paragraph >>> (job), the active pyspark jobs of the other users are canceled too. >>> >>> Going into detail, I've seen that when you cancel a user's job this >>> method is invoked (which is fine): >>> >>> sc.cancelJobGroup("zeppelin-[notebook-id]-[paragraph-id]") >>> >>> But somehow unknown to me, this method is also invoked: >>> >>> sc.cancelAllJobs() >>> >>> The above is due to the trace of the log that appears in the jobs of the >>> other users: >>> >>> Py4JJavaError: An error occurred while calling o885.count. >>> : org.apache.spark.SparkException: Job 461 cancelled as part of >>> cancellation of all jobs >>> at org.apache.spark.scheduler.DAGScheduler.org<http://org.apach >>> e.spark.scheduler.DAGScheduler.org>$apache$spark$scheduler$D >>> AGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1435) >>> at org.apache.spark.scheduler.DAGScheduler.handleJobCancellatio >>> n(DAGScheduler.scala:1375) >>> at org.apache.spark.scheduler.DAGScheduler$$anonfun$doCancelAll >>> Jobs$1.apply$mcVI$sp(DAGScheduler.scala:721) >>> at org.apache.spark.scheduler.DAGScheduler$$anonfun$doCancelAll >>> Jobs$1.apply(DAGScheduler.scala:721) >>> at org.apache.spark.scheduler.DAGScheduler$$anonfun$doCancelAll >>> Jobs$1.apply(DAGScheduler.scala:721) >>> at scala.collection.mutable.HashSet.foreach(HashSet.scala:78) >>> at org.apache.spark.scheduler.DAGScheduler.doCancelAllJobs(DAGS >>> cheduler.scala:721) >>> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOn >>> Receive(DAGScheduler.scala:1628) >>> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onRe >>> ceive(DAGScheduler.scala:1605) >>> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onRe >>> ceive(DAGScheduler.scala:1594) >>> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) >>> at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler. >>> scala:628) >>> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1925) >>> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1938) >>> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1951) >>> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1965) >>> at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:936) >>> at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperati >>> onScope.scala:151) >>> at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperati >>> onScope.scala:112) >>> at org.apache.spark.rdd.RDD.withScope(RDD.scala:362) >>> at org.apache.spark.rdd.RDD.collect(RDD.scala:935) >>> at org.apache.spark.sql.execution.SparkPlan.executeCollect(Spar >>> kPlan.scala:275) >>> at org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$D >>> ataset$$execute$1$1.apply(Dataset.scala:2386) >>> at org.apache.spark.sql.execution.SQLExecution$.withNewExecutio >>> nId(SQLExecution.scala:57) >>> at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2788) >>> at org.apache.spark.sql.Dataset.org<http://org.apache.spark.sql >>> .Dataset.org>$apache$spark$sql$Dataset$$execute$1(Dataset.scala:2385) >>> at org.apache.spark.sql.Dataset.org<http://org.apache.spark.sql >>> .Dataset.org>$apache$spark$sql$Dataset$$collect(Dataset.scala:2392) >>> at org.apache.spark.sql.Dataset$$anonfun$count$1.apply(Dataset. >>> scala:2420) >>> at org.apache.spark.sql.Dataset$$anonfun$count$1.apply(Dataset. >>> scala:2419) >>> at org.apache.spark.sql.Dataset.withCallback(Dataset.scala:2801) >>> at org.apache.spark.sql.Dataset.count(Dataset.scala:2419) >>> at sun.reflect.GeneratedMethodAccessor120.invoke(Unknown Source) >>> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe >>> thodAccessorImpl.java:43) >>> at java.lang.reflect.Method.invoke(Method.java:498) >>> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) >>> at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) >>> at py4j.Gateway.invoke(Gateway.java:280) >>> at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) >>> at py4j.commands.CallCommand.execute(CallCommand.java:79) >>> at py4j.GatewayConnection.run(GatewayConnection.java:214) >>> at java.lang.Thread.run(Thread.java:748) >>> >>> (<class 'py4j.protocol.Py4JJavaError'>, Py4JJavaError('An error >>> occurred while calling o885.count.\n', JavaObject id=o886), <traceback >>> object at 0x7f9e669ae588>) >>> >>> Any idea of why this could be happening? >>> >>> (I have 0.8.0 version from September 2017) >>> >>> Thank you! >>> >> >> >