Ah, sorry for leading you astray a bit. I was working from memory instead of looking at the code, and was probably thinking back all the way to Reynold's initial implementation of SparkContext#killJob(), which was public. I'd have to do some digging to determine exactly when and why SparkContext#cancelJob() became private[spark]. Of course, the other problem is that more often than not I am working with custom builds of Spark, and I'm not beyond changing selected things from private to public. :)
When you start talking about doing some checks before killing a job, I imagine that you are thinking about something like checking that parts of a job are not needed by other jobs, etc. That's a reasonable idea, but the realization of that goal is not simple -- especially not when you start including asynchronous execution with various timeouts or other events requesting cancellation, or more extensive reuse functionality as in https://issues.apache.org/jira/browse/SPARK-11838 If you don't want to spend a lot of time looking at Job cancellation issues, best to back away now! :) On Wed, Dec 16, 2015 at 4:26 PM, Jacek Laskowski <ja...@japila.pl> wrote: > Thanks Mark for the answer! It helps, but still leaves me with few > more questions. If you don't mind, I'd like to ask you few more > questions. > > When you said "It can be used, and is used in user code, but it isn't > always as straightforward as you might think." did you think about the > Spark code or some other user code? Can I have a look at the code and > the use case? The method is `private[spark]` and it's not even > @DeveloperApi that makes using the method even more risky. I believe > it's a very low-level ingredient of Spark that very few people use if > at all. If I could see the code that uses the method, that could help. > > Following up, isn't killing a stage similar to killing a job? They can > both be shared and I could imagine a very similar case for killing a > job as for a stage where an implementation does some checks before > killing the job eventually. It is possible for stages that are in a > sense similar to jobs so...I'm still unsure why the method is not used > by Spark itself. If it's not used by Spark why could it be useful for > others outside Spark? > > Doh, why did I come across the method? It will take some time before I > forget about it :-) > > Pozdrawiam, > Jacek > > -- > Jacek Laskowski | https://medium.com/@jaceklaskowski/ | > http://blog.jaceklaskowski.pl > Mastering Spark https://jaceklaskowski.gitbooks.io/mastering-apache-spark/ > Follow me at https://twitter.com/jaceklaskowski > Upvote at http://stackoverflow.com/users/1305344/jacek-laskowski > > > On Wed, Dec 16, 2015 at 10:55 AM, Mark Hamstra <m...@clearstorydata.com> > wrote: > > It can be used, and is used in user code, but it isn't always as > > straightforward as you might think. This is mostly because a Job often > > isn't a Job -- or rather it is more than one Job. There are several RDD > > transformations that aren't lazy, so they end up launching "hidden" Jobs > > that you may not anticipate and may expect to be canceled (but won't be) > by > > a cancelJob() called on a later action on that transformed RDD. It is > also > > possible for a single DataFrame or Spark SQL query to result in more than > > one running Job. The upshot of all of this is that getting cancelJob() > to > > work as most users would expect all the time is non-trivial, and most of > the > > time using a jobGroup is a better way to capture what may be more than > one > > Job that the user is thinking of as a single Job. > > > > On Wed, Dec 16, 2015 at 5:34 AM, Sean Owen <so...@cloudera.com> wrote: > >> > >> It does look like it's not actually used. It may simply be there for > >> completeness, to match cancelStage and cancelJobGroup, which are used. > >> I also don't know of a good reason there's no way to kill a whole job. > >> > >> On Wed, Dec 16, 2015 at 1:15 PM, Jacek Laskowski <ja...@japila.pl> > wrote: > >> > Hi, > >> > > >> > While reviewing Spark code I came across SparkContext.cancelJob. I > >> > found no part of Spark using it. Is this a leftover after some > >> > refactoring? Why is this part of sc? > >> > > >> > The reason I'm asking is another question I'm having after having > >> > learnt about killing a stage in webUI. I noticed there is a way to > >> > kill/cancel stages, but no corresponding feature to kill/cancel jobs. > >> > Why? Is there a JIRA ticket to have it some day perhaps? > >> > > >> > Pozdrawiam, > >> > Jacek > >> > > >> > -- > >> > Jacek Laskowski | https://medium.com/@jaceklaskowski/ > >> > Mastering Apache Spark > >> > ==> https://jaceklaskowski.gitbooks.io/mastering-apache-spark/ > >> > Follow me at https://twitter.com/jaceklaskowski > >> > > >> > --------------------------------------------------------------------- > >> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > >> > For additional commands, e-mail: user-h...@spark.apache.org > >> > > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > >> For additional commands, e-mail: user-h...@spark.apache.org > >> > > >