Thanks Mark! That helped a lot, and my takeaway from it is to...back
away now! :) I'm following the advice as there's simply too much at
the moment to learn in Spark.

Pozdrawiam,
Jacek

Jacek Laskowski | https://medium.com/@jaceklaskowski/
Mastering Apache Spark
==> https://jaceklaskowski.gitbooks.io/mastering-apache-spark/
Follow me at https://twitter.com/jaceklaskowski


On Thu, Dec 17, 2015 at 1:13 PM, Mark Hamstra <m...@clearstorydata.com> wrote:
> Ah, sorry for leading you astray a bit.  I was working from memory instead
> of looking at the code, and was probably thinking back all the way to
> Reynold's initial implementation of SparkContext#killJob(), which was
> public.  I'd have to do some digging to determine exactly when and why
> SparkContext#cancelJob() became private[spark].  Of course, the other
> problem is that more often than not I am working with custom builds of
> Spark, and I'm not beyond changing selected things from private to public.
> :)
>
> When you start talking about doing some checks before killing a job, I
> imagine that you are thinking about something like checking that parts of a
> job are not needed by other jobs, etc.  That's a reasonable idea, but the
> realization of that goal is not simple -- especially not when you start
> including asynchronous execution with various timeouts or other events
> requesting cancellation, or more extensive reuse functionality as in
> https://issues.apache.org/jira/browse/SPARK-11838  If you don't want to
> spend a lot of time looking at Job cancellation issues, best to back away
> now! :)
>
> On Wed, Dec 16, 2015 at 4:26 PM, Jacek Laskowski <ja...@japila.pl> wrote:
>>
>> Thanks Mark for the answer! It helps, but still leaves me with few
>> more questions. If you don't mind, I'd like to ask you few more
>> questions.
>>
>> When you said "It can be used, and is used in user code, but it isn't
>> always as straightforward as you might think." did you think about the
>> Spark code or some other user code? Can I have a look at the code and
>> the use case? The method is `private[spark]` and it's not even
>> @DeveloperApi that makes using the method even more risky. I believe
>> it's a very low-level ingredient of Spark that very few people use if
>> at all. If I could see the code that uses the method, that could help.
>>
>> Following up, isn't killing a stage similar to killing a job? They can
>> both be shared and I could imagine a very similar case for killing a
>> job as for a stage where an implementation does some checks before
>> killing the job eventually. It is possible for stages that are in a
>> sense similar to jobs so...I'm still unsure why the method is not used
>> by Spark itself. If it's not used by Spark why could it be useful for
>> others outside Spark?
>>
>> Doh, why did I come across the method? It will take some time before I
>> forget about it :-)
>>
>> Pozdrawiam,
>> Jacek
>>
>> --
>> Jacek Laskowski | https://medium.com/@jaceklaskowski/ |
>> http://blog.jaceklaskowski.pl
>> Mastering Spark https://jaceklaskowski.gitbooks.io/mastering-apache-spark/
>> Follow me at https://twitter.com/jaceklaskowski
>> Upvote at http://stackoverflow.com/users/1305344/jacek-laskowski
>>
>>
>> On Wed, Dec 16, 2015 at 10:55 AM, Mark Hamstra <m...@clearstorydata.com>
>> wrote:
>> > It can be used, and is used in user code, but it isn't always as
>> > straightforward as you might think.  This is mostly because a Job often
>> > isn't a Job -- or rather it is more than one Job.  There are several RDD
>> > transformations that aren't lazy, so they end up launching "hidden" Jobs
>> > that you may not anticipate and may expect to be canceled (but won't be)
>> > by
>> > a cancelJob() called on a later action on that transformed RDD.  It is
>> > also
>> > possible for a single DataFrame or Spark SQL query to result in more
>> > than
>> > one running Job.  The upshot of all of this is that getting cancelJob()
>> > to
>> > work as most users would expect all the time is non-trivial, and most of
>> > the
>> > time using a jobGroup is a better way to capture what may be more than
>> > one
>> > Job that the user is thinking of as a single Job.
>> >
>> > On Wed, Dec 16, 2015 at 5:34 AM, Sean Owen <so...@cloudera.com> wrote:
>> >>
>> >> It does look like it's not actually used. It may simply be there for
>> >> completeness, to match cancelStage and cancelJobGroup, which are used.
>> >> I also don't know of a good reason there's no way to kill a whole job.
>> >>
>> >> On Wed, Dec 16, 2015 at 1:15 PM, Jacek Laskowski <ja...@japila.pl>
>> >> wrote:
>> >> > Hi,
>> >> >
>> >> > While reviewing Spark code I came across SparkContext.cancelJob. I
>> >> > found no part of Spark using it. Is this a leftover after some
>> >> > refactoring? Why is this part of sc?
>> >> >
>> >> > The reason I'm asking is another question I'm having after having
>> >> > learnt about killing a stage in webUI. I noticed there is a way to
>> >> > kill/cancel stages, but no corresponding feature to kill/cancel jobs.
>> >> > Why? Is there a JIRA ticket to have it some day perhaps?
>> >> >
>> >> > Pozdrawiam,
>> >> > Jacek
>> >> >
>> >> > --
>> >> > Jacek Laskowski | https://medium.com/@jaceklaskowski/
>> >> > Mastering Apache Spark
>> >> > ==> https://jaceklaskowski.gitbooks.io/mastering-apache-spark/
>> >> > Follow me at https://twitter.com/jaceklaskowski
>> >> >
>> >> > ---------------------------------------------------------------------
>> >> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> >> > For additional commands, e-mail: user-h...@spark.apache.org
>> >> >
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> >> For additional commands, e-mail: user-h...@spark.apache.org
>> >>
>> >
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to