Hi! We found the reason why this error is happening. It seems to be related with the solution <https://github.com/apache/zeppelin/commit/9f22db91c279b7daf6a13b2d805a874074b070fd> for the task ZEPPELIN-2075 <https://issues.apache.org/jira/browse/ZEPPELIN-2075>.
This solution is causing that when one particular user cancels his py-spark job, the py-spark jobs from all the users are being canceled. When a py-spark job is cancelled, the method PySparkInterpreter interrupt() is invoked, and then the SIGINT context.py: # create a signal handler which would be invoked on receiving SIGINT def signal_handler(signal, frame): self.cancelAllJobs() raise KeyboardInterrupt() 2018-06-12 9:26 GMT-05:00 Jhon Anderson Cardenas Diaz < jhonderson2...@gmail.com>: > Hi!. > I have 0.8.0 version, from September 2017 > > 2018-06-12 4:48 GMT-05:00 Jianfeng (Jeff) Zhang <jzh...@hortonworks.com>: > >> >> Which version do you use ? >> >> >> Best Regard, >> Jeff Zhang >> >> >> From: Jhon Anderson Cardenas Diaz <jhonderson2...@gmail.com<mailto: >> jhonderson2...@gmail.com>> >> Reply-To: "us...@zeppelin.apache.org<mailto:us...@zeppelin.apache.org>" < >> us...@zeppelin.apache.org<mailto:us...@zeppelin.apache.org>> >> Date: Friday, June 8, 2018 at 11:08 PM >> To: "us...@zeppelin.apache.org<mailto:us...@zeppelin.apache.org>" < >> us...@zeppelin.apache.org<mailto:us...@zeppelin.apache.org>>, " >> dev@zeppelin.apache.org<mailto:dev@zeppelin.apache.org>" < >> dev@zeppelin.apache.org<mailto:dev@zeppelin.apache.org>> >> Subject: All PySpark jobs are canceled when one user cancel his PySpark >> paragraph (job) >> >> Dear community, >> >> Currently we are having problems with multiple users running paragraphs >> associated with pyspark jobs. >> >> The problem is that if an user aborts/cancels his pyspark paragraph >> (job), the active pyspark jobs of the other users are canceled too. >> >> Going into detail, I've seen that when you cancel a user's job this >> method is invoked (which is fine): >> >> sc.cancelJobGroup("zeppelin-[notebook-id]-[paragraph-id]") >> >> But somehow unknown to me, this method is also invoked: >> >> sc.cancelAllJobs() >> >> The above is due to the trace of the log that appears in the jobs of the >> other users: >> >> Py4JJavaError: An error occurred while calling o885.count. >> : org.apache.spark.SparkException: Job 461 cancelled as part of >> cancellation of all jobs >> at org.apache.spark.scheduler.DAGScheduler.org<http://org.apach >> e.spark.scheduler.DAGScheduler.org>$apache$spark$scheduler$ >> DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1435) >> at org.apache.spark.scheduler.DAGScheduler.handleJobCancellatio >> n(DAGScheduler.scala:1375) >> at org.apache.spark.scheduler.DAGScheduler$$anonfun$doCancelAll >> Jobs$1.apply$mcVI$sp(DAGScheduler.scala:721) >> at org.apache.spark.scheduler.DAGScheduler$$anonfun$doCancelAll >> Jobs$1.apply(DAGScheduler.scala:721) >> at org.apache.spark.scheduler.DAGScheduler$$anonfun$doCancelAll >> Jobs$1.apply(DAGScheduler.scala:721) >> at scala.collection.mutable.HashSet.foreach(HashSet.scala:78) >> at org.apache.spark.scheduler.DAGScheduler.doCancelAllJobs(DAGS >> cheduler.scala:721) >> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOn >> Receive(DAGScheduler.scala:1628) >> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onRe >> ceive(DAGScheduler.scala:1605) >> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onRe >> ceive(DAGScheduler.scala:1594) >> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) >> at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:628) >> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1925) >> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1938) >> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1951) >> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1965) >> at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:936) >> at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperati >> onScope.scala:151) >> at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperati >> onScope.scala:112) >> at org.apache.spark.rdd.RDD.withScope(RDD.scala:362) >> at org.apache.spark.rdd.RDD.collect(RDD.scala:935) >> at org.apache.spark.sql.execution.SparkPlan.executeCollect( >> SparkPlan.scala:275) >> at org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$D >> ataset$$execute$1$1.apply(Dataset.scala:2386) >> at org.apache.spark.sql.execution.SQLExecution$.withNewExecutio >> nId(SQLExecution.scala:57) >> at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2788) >> at org.apache.spark.sql.Dataset.org<http://org.apache.spark.sql >> .Dataset.org>$apache$spark$sql$Dataset$$execute$1(Dataset.scala:2385) >> at org.apache.spark.sql.Dataset.org<http://org.apache.spark.sql >> .Dataset.org>$apache$spark$sql$Dataset$$collect(Dataset.scala:2392) >> at org.apache.spark.sql.Dataset$$anonfun$count$1.apply(Dataset. >> scala:2420) >> at org.apache.spark.sql.Dataset$$anonfun$count$1.apply(Dataset. >> scala:2419) >> at org.apache.spark.sql.Dataset.withCallback(Dataset.scala:2801) >> at org.apache.spark.sql.Dataset.count(Dataset.scala:2419) >> at sun.reflect.GeneratedMethodAccessor120.invoke(Unknown Source) >> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe >> thodAccessorImpl.java:43) >> at java.lang.reflect.Method.invoke(Method.java:498) >> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) >> at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) >> at py4j.Gateway.invoke(Gateway.java:280) >> at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) >> at py4j.commands.CallCommand.execute(CallCommand.java:79) >> at py4j.GatewayConnection.run(GatewayConnection.java:214) >> at java.lang.Thread.run(Thread.java:748) >> >> (<class 'py4j.protocol.Py4JJavaError'>, Py4JJavaError('An error occurred >> while calling o885.count.\n', JavaObject id=o886), <traceback object at >> 0x7f9e669ae588>) >> >> Any idea of why this could be happening? >> >> (I have 0.8.0 version from September 2017) >> >> Thank you! >> > >