Hi, we already have spark (and python) configured as per user - scoped mode and even in that case it does not work. But i will try your second option!. thank you..
2018-06-12 21:24 GMT-05:00 Jeff Zhang <zjf...@gmail.com>: > This is a limitation of the native PySparkInterpreter. > > Two solutions for you. > 1. Use per user scoped mode so that each user own his own python process > 2. Use IPySparkInterpreter of zeppelin 0.8 which is better for integration > python with zeppelin. > > > > Jhon Anderson Cardenas Diaz <jhonderson2...@gmail.com>于2018年6月13日周三 > 上午6:15写道: > > > Hi! > > > > We found the reason why this error is happening. It seems to be related > > with the solution > > < > > https://github.com/apache/zeppelin/commit/9f22db91c279b7daf6a13b2d805a87 > 4074b070fd > > > > > for the task ZEPPELIN-2075 > > <https://issues.apache.org/jira/browse/ZEPPELIN-2075>. > > > > This solution is causing that when one particular user cancels his > py-spark > > job, the py-spark jobs from *all the users are being canceled !!*. > > > > When a py-spark job is cancelled, the method PySparkInterpreter > interrupt() > > is invoked, and then the SIGINT event is called, causing that all the > jobs > > in the same spark context be cancelled: > > > > context.py: > > > > # create a signal handler which would be invoked on receiving SIGINT > > def signal_handler(signal, frame): > > *self.cancelAllJobs()* > > raise KeyboardInterrupt() > > > > Is this a zeppelin bug ? > > > > Thank you. > > > > > > 2018-06-12 17:12 GMT-05:00 Jhon Anderson Cardenas Diaz < > > jhonderson2...@gmail.com>: > > > > > Hi! > > > > > > We found the reason why this error is happening. It seems to be related > > > with the solution > > > < > > https://github.com/apache/zeppelin/commit/9f22db91c279b7daf6a13b2d805a87 > 4074b070fd > > > > > > for the task ZEPPELIN-2075 > > > <https://issues.apache.org/jira/browse/ZEPPELIN-2075>. > > > > > > This solution is causing that when one particular user cancels his > > > py-spark job, the py-spark jobs from all the users are being canceled. > > > > > > When a py-spark job is cancelled, the method PySparkInterpreter > > > interrupt() is invoked, and then the SIGINT > > > > > > context.py: > > > > > > # create a signal handler which would be invoked on receiving SIGINT > > > def signal_handler(signal, frame): > > > self.cancelAllJobs() > > > raise KeyboardInterrupt() > > > > > > > > > 2018-06-12 9:26 GMT-05:00 Jhon Anderson Cardenas Diaz < > > > jhonderson2...@gmail.com>: > > > > > >> Hi!. > > >> I have 0.8.0 version, from September 2017 > > >> > > >> 2018-06-12 4:48 GMT-05:00 Jianfeng (Jeff) Zhang < > jzh...@hortonworks.com > > >: > > >> > > >>> > > >>> Which version do you use ? > > >>> > > >>> > > >>> Best Regard, > > >>> Jeff Zhang > > >>> > > >>> > > >>> From: Jhon Anderson Cardenas Diaz <jhonderson2...@gmail.com<mailto: > > >>> jhonderson2...@gmail.com>> > > >>> Reply-To: "us...@zeppelin.apache.org<mailto:users@zeppelin.apache. > org > > >" > > >>> <us...@zeppelin.apache.org<mailto:us...@zeppelin.apache.org>> > > >>> Date: Friday, June 8, 2018 at 11:08 PM > > >>> To: "us...@zeppelin.apache.org<mailto:us...@zeppelin.apache.org>" < > > >>> us...@zeppelin.apache.org<mailto:us...@zeppelin.apache.org>>, " > > >>> dev@zeppelin.apache.org<mailto:dev@zeppelin.apache.org>" < > > >>> dev@zeppelin.apache.org<mailto:dev@zeppelin.apache.org>> > > >>> Subject: All PySpark jobs are canceled when one user cancel his > PySpark > > >>> paragraph (job) > > >>> > > >>> Dear community, > > >>> > > >>> Currently we are having problems with multiple users running > paragraphs > > >>> associated with pyspark jobs. > > >>> > > >>> The problem is that if an user aborts/cancels his pyspark paragraph > > >>> (job), the active pyspark jobs of the other users are canceled too. > > >>> > > >>> Going into detail, I've seen that when you cancel a user's job this > > >>> method is invoked (which is fine): > > >>> > > >>> sc.cancelJobGroup("zeppelin-[notebook-id]-[paragraph-id]") > > >>> > > >>> But somehow unknown to me, this method is also invoked: > > >>> > > >>> sc.cancelAllJobs() > > >>> > > >>> The above is due to the trace of the log that appears in the jobs of > > the > > >>> other users: > > >>> > > >>> Py4JJavaError: An error occurred while calling o885.count. > > >>> : org.apache.spark.SparkException: Job 461 cancelled as part of > > >>> cancellation of all jobs > > >>> at org.apache.spark.scheduler.DAGScheduler.org<http://org.apach > > >>> e.spark.scheduler.DAGScheduler.org>$apache$spark$scheduler$D > > >>> AGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1435) > > >>> at org.apache.spark.scheduler.DAGScheduler.handleJobCancellatio > > >>> n(DAGScheduler.scala:1375) > > >>> at org.apache.spark.scheduler.DAGScheduler$$anonfun$doCancelAll > > >>> Jobs$1.apply$mcVI$sp(DAGScheduler.scala:721) > > >>> at org.apache.spark.scheduler.DAGScheduler$$anonfun$doCancelAll > > >>> Jobs$1.apply(DAGScheduler.scala:721) > > >>> at org.apache.spark.scheduler.DAGScheduler$$anonfun$doCancelAll > > >>> Jobs$1.apply(DAGScheduler.scala:721) > > >>> at scala.collection.mutable.HashSet.foreach(HashSet.scala:78) > > >>> at org.apache.spark.scheduler.DAGScheduler.doCancelAllJobs(DAGS > > >>> cheduler.scala:721) > > >>> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOn > > >>> Receive(DAGScheduler.scala:1628) > > >>> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onRe > > >>> ceive(DAGScheduler.scala:1605) > > >>> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onRe > > >>> ceive(DAGScheduler.scala:1594) > > >>> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) > > >>> at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler. > > >>> scala:628) > > >>> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1925) > > >>> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1938) > > >>> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1951) > > >>> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1965) > > >>> at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:936) > > >>> at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperati > > >>> onScope.scala:151) > > >>> at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperati > > >>> onScope.scala:112) > > >>> at org.apache.spark.rdd.RDD.withScope(RDD.scala:362) > > >>> at org.apache.spark.rdd.RDD.collect(RDD.scala:935) > > >>> at org.apache.spark.sql.execution.SparkPlan.executeCollect(Spar > > >>> kPlan.scala:275) > > >>> at org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$D > > >>> ataset$$execute$1$1.apply(Dataset.scala:2386) > > >>> at org.apache.spark.sql.execution.SQLExecution$.withNewExecutio > > >>> nId(SQLExecution.scala:57) > > >>> at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset. > scala:2788) > > >>> at org.apache.spark.sql.Dataset.org<http://org.apache.spark.sql > > >>> .Dataset.org>$apache$spark$sql$Dataset$$execute$1( > Dataset.scala:2385) > > >>> at org.apache.spark.sql.Dataset.org<http://org.apache.spark.sql > > >>> .Dataset.org>$apache$spark$sql$Dataset$$collect(Dataset.scala:2392) > > >>> at org.apache.spark.sql.Dataset$$anonfun$count$1.apply(Dataset. > > >>> scala:2420) > > >>> at org.apache.spark.sql.Dataset$$anonfun$count$1.apply(Dataset. > > >>> scala:2419) > > >>> at org.apache.spark.sql.Dataset.withCallback(Dataset.scala:2801) > > >>> at org.apache.spark.sql.Dataset.count(Dataset.scala:2419) > > >>> at sun.reflect.GeneratedMethodAccessor120.invoke(Unknown Source) > > >>> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe > > >>> thodAccessorImpl.java:43) > > >>> at java.lang.reflect.Method.invoke(Method.java:498) > > >>> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) > > >>> at py4j.reflection.ReflectionEngine.invoke( > ReflectionEngine.java:357) > > >>> at py4j.Gateway.invoke(Gateway.java:280) > > >>> at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand. > java:132) > > >>> at py4j.commands.CallCommand.execute(CallCommand.java:79) > > >>> at py4j.GatewayConnection.run(GatewayConnection.java:214) > > >>> at java.lang.Thread.run(Thread.java:748) > > >>> > > >>> (<class 'py4j.protocol.Py4JJavaError'>, Py4JJavaError('An error > > >>> occurred while calling o885.count.\n', JavaObject id=o886), > <traceback > > >>> object at 0x7f9e669ae588>) > > >>> > > >>> Any idea of why this could be happening? > > >>> > > >>> (I have 0.8.0 version from September 2017) > > >>> > > >>> Thank you! > > >>> > > >> > > >> > > > > > >