Re: All PySpark jobs are canceled when one user cancel his PySpark paragraph (job)

Jhon Anderson Cardenas Diaz Tue, 12 Jun 2018 19:33:22 -0700

Hi, we already have spark (and python) configured as per user - scoped mode
and even in that case it does not work. But i will try your second option!.
thank you..


2018-06-12 21:24 GMT-05:00 Jeff Zhang <[email protected]>:

> This is a limitation of the native PySparkInterpreter.
>
> Two solutions for you.
> 1. Use per user scoped mode so that each user own his own python process
> 2. Use IPySparkInterpreter of zeppelin 0.8 which is better for integration
> python with zeppelin.
>
>
>
> Jhon Anderson Cardenas Diaz <[email protected]>于2018年6月13日周三
> 上午6:15写道：
>
> > Hi!
> >
> > We found the reason why this error is happening. It seems to be related
> > with the solution
> > <
> > https://github.com/apache/zeppelin/commit/9f22db91c279b7daf6a13b2d805a87
> 4074b070fd
> > >
> > for the task ZEPPELIN-2075
> > <https://issues.apache.org/jira/browse/ZEPPELIN-2075>.
> >
> > This solution is causing that when one particular user cancels his
> py-spark
> > job, the py-spark jobs from *all the users are being canceled !!*.
> >
> > When a py-spark job is cancelled, the method PySparkInterpreter
> interrupt()
> > is invoked, and then the SIGINT event is called, causing that all the
> jobs
> > in the same spark context be cancelled:
> >
> > context.py:
> >
> > # create a signal handler which would be invoked on receiving SIGINT
> > def signal_handler(signal, frame):
> >     *self.cancelAllJobs()*
> >     raise KeyboardInterrupt()
> >
> > Is this a zeppelin bug ?
> >
> > Thank you.
> >
> >
> > 2018-06-12 17:12 GMT-05:00 Jhon Anderson Cardenas Diaz <
> > [email protected]>:
> >
> > > Hi!
> > >
> > > We found the reason why this error is happening. It seems to be related
> > > with the solution
> > > <
> > https://github.com/apache/zeppelin/commit/9f22db91c279b7daf6a13b2d805a87
> 4074b070fd
> > >
> > > for the task ZEPPELIN-2075
> > > <https://issues.apache.org/jira/browse/ZEPPELIN-2075>.
> > >
> > > This solution is causing that when one particular user cancels his
> > > py-spark job, the py-spark jobs from all the users are being canceled.
> > >
> > > When a py-spark job is cancelled, the method PySparkInterpreter
> > > interrupt() is invoked, and then the SIGINT
> > >
> > > context.py:
> > >
> > > # create a signal handler which would be invoked on receiving SIGINT
> > > def signal_handler(signal, frame):
> > >     self.cancelAllJobs()
> > >     raise KeyboardInterrupt()
> > >
> > >
> > > 2018-06-12 9:26 GMT-05:00 Jhon Anderson Cardenas Diaz <
> > > [email protected]>:
> > >
> > >> Hi!.
> > >> I have 0.8.0 version, from September  2017
> > >>
> > >> 2018-06-12 4:48 GMT-05:00 Jianfeng (Jeff) Zhang <
> [email protected]
> > >:
> > >>
> > >>>
> > >>> Which version do you use ?
> > >>>
> > >>>
> > >>> Best Regard,
> > >>> Jeff Zhang
> > >>>
> > >>>
> > >>> From: Jhon Anderson Cardenas Diaz <[email protected]<mailto:
> > >>> [email protected]>>
> > >>> Reply-To: "[email protected]<mailto:[email protected].
> org
> > >"
> > >>> <[email protected]<mailto:[email protected]>>
> > >>> Date: Friday, June 8, 2018 at 11:08 PM
> > >>> To: "[email protected]<mailto:[email protected]>" <
> > >>> [email protected]<mailto:[email protected]>>, "
> > >>> [email protected]<mailto:[email protected]>" <
> > >>> [email protected]<mailto:[email protected]>>
> > >>> Subject: All PySpark jobs are canceled when one user cancel his
> PySpark
> > >>> paragraph (job)
> > >>>
> > >>> Dear community,
> > >>>
> > >>> Currently we are having problems with multiple users running
> paragraphs
> > >>> associated with pyspark jobs.
> > >>>
> > >>> The problem is that if an user aborts/cancels his pyspark paragraph
> > >>> (job), the active pyspark jobs of the other users are canceled too.
> > >>>
> > >>> Going into detail, I've seen that when you cancel a user's job this
> > >>> method is invoked (which is fine):
> > >>>
> > >>> sc.cancelJobGroup("zeppelin-[notebook-id]-[paragraph-id]")
> > >>>
> > >>> But somehow unknown to me, this method is also invoked:
> > >>>
> > >>> sc.cancelAllJobs()
> > >>>
> > >>> The above is due to the trace of the log that appears in the jobs of
> > the
> > >>> other users:
> > >>>
> > >>> Py4JJavaError: An error occurred while calling o885.count.
> > >>> : org.apache.spark.SparkException: Job 461 cancelled as part of
> > >>> cancellation of all jobs
> > >>> at org.apache.spark.scheduler.DAGScheduler.org<http://org.apach
> > >>> e.spark.scheduler.DAGScheduler.org>$apache$spark$scheduler$D
> > >>> AGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1435)
> > >>> at org.apache.spark.scheduler.DAGScheduler.handleJobCancellatio
> > >>> n(DAGScheduler.scala:1375)
> > >>> at org.apache.spark.scheduler.DAGScheduler$$anonfun$doCancelAll
> > >>> Jobs$1.apply$mcVI$sp(DAGScheduler.scala:721)
> > >>> at org.apache.spark.scheduler.DAGScheduler$$anonfun$doCancelAll
> > >>> Jobs$1.apply(DAGScheduler.scala:721)
> > >>> at org.apache.spark.scheduler.DAGScheduler$$anonfun$doCancelAll
> > >>> Jobs$1.apply(DAGScheduler.scala:721)
> > >>> at scala.collection.mutable.HashSet.foreach(HashSet.scala:78)
> > >>> at org.apache.spark.scheduler.DAGScheduler.doCancelAllJobs(DAGS
> > >>> cheduler.scala:721)
> > >>> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOn
> > >>> Receive(DAGScheduler.scala:1628)
> > >>> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onRe
> > >>> ceive(DAGScheduler.scala:1605)
> > >>> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onRe
> > >>> ceive(DAGScheduler.scala:1594)
> > >>> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
> > >>> at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.
> > >>> scala:628)
> > >>> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1925)
> > >>> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1938)
> > >>> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1951)
> > >>> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1965)
> > >>> at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:936)
> > >>> at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperati
> > >>> onScope.scala:151)
> > >>> at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperati
> > >>> onScope.scala:112)
> > >>> at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
> > >>> at org.apache.spark.rdd.RDD.collect(RDD.scala:935)
> > >>> at org.apache.spark.sql.execution.SparkPlan.executeCollect(Spar
> > >>> kPlan.scala:275)
> > >>> at org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$D
> > >>> ataset$$execute$1$1.apply(Dataset.scala:2386)
> > >>> at org.apache.spark.sql.execution.SQLExecution$.withNewExecutio
> > >>> nId(SQLExecution.scala:57)
> > >>> at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.
> scala:2788)
> > >>> at org.apache.spark.sql.Dataset.org<http://org.apache.spark.sql
> > >>> .Dataset.org>$apache$spark$sql$Dataset$$execute$1(
> Dataset.scala:2385)
> > >>> at org.apache.spark.sql.Dataset.org<http://org.apache.spark.sql
> > >>> .Dataset.org>$apache$spark$sql$Dataset$$collect(Dataset.scala:2392)
> > >>> at org.apache.spark.sql.Dataset$$anonfun$count$1.apply(Dataset.
> > >>> scala:2420)
> > >>> at org.apache.spark.sql.Dataset$$anonfun$count$1.apply(Dataset.
> > >>> scala:2419)
> > >>> at org.apache.spark.sql.Dataset.withCallback(Dataset.scala:2801)
> > >>> at org.apache.spark.sql.Dataset.count(Dataset.scala:2419)
> > >>> at sun.reflect.GeneratedMethodAccessor120.invoke(Unknown Source)
> > >>> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe
> > >>> thodAccessorImpl.java:43)
> > >>> at java.lang.reflect.Method.invoke(Method.java:498)
> > >>> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
> > >>> at py4j.reflection.ReflectionEngine.invoke(
> ReflectionEngine.java:357)
> > >>> at py4j.Gateway.invoke(Gateway.java:280)
> > >>> at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.
> java:132)
> > >>> at py4j.commands.CallCommand.execute(CallCommand.java:79)
> > >>> at py4j.GatewayConnection.run(GatewayConnection.java:214)
> > >>> at java.lang.Thread.run(Thread.java:748)
> > >>>
> > >>> (<class 'py4j.protocol.Py4JJavaError'>, Py4JJavaError('An error
> > >>> occurred while calling o885.count.\n', JavaObject id=o886),
> <traceback
> > >>> object at 0x7f9e669ae588>)
> > >>>
> > >>> Any idea of why this could be happening?
> > >>>
> > >>> (I have 0.8.0 version from September 2017)
> > >>>
> > >>> Thank you!
> > >>>
> > >>
> > >>
> > >
> >
>

Re: All PySpark jobs are canceled when one user cancel his PySpark paragraph (job)

Reply via email to