Thanks Mark! That helped a lot, and my takeaway from it is to...back away now! :) I'm following the advice as there's simply too much at the moment to learn in Spark.
Pozdrawiam, Jacek Jacek Laskowski | https://medium.com/@jaceklaskowski/ Mastering Apache Spark ==> https://jaceklaskowski.gitbooks.io/mastering-apache-spark/ Follow me at https://twitter.com/jaceklaskowski On Thu, Dec 17, 2015 at 1:13 PM, Mark Hamstra <m...@clearstorydata.com> wrote: > Ah, sorry for leading you astray a bit. I was working from memory instead > of looking at the code, and was probably thinking back all the way to > Reynold's initial implementation of SparkContext#killJob(), which was > public. I'd have to do some digging to determine exactly when and why > SparkContext#cancelJob() became private[spark]. Of course, the other > problem is that more often than not I am working with custom builds of > Spark, and I'm not beyond changing selected things from private to public. > :) > > When you start talking about doing some checks before killing a job, I > imagine that you are thinking about something like checking that parts of a > job are not needed by other jobs, etc. That's a reasonable idea, but the > realization of that goal is not simple -- especially not when you start > including asynchronous execution with various timeouts or other events > requesting cancellation, or more extensive reuse functionality as in > https://issues.apache.org/jira/browse/SPARK-11838 If you don't want to > spend a lot of time looking at Job cancellation issues, best to back away > now! :) > > On Wed, Dec 16, 2015 at 4:26 PM, Jacek Laskowski <ja...@japila.pl> wrote: >> >> Thanks Mark for the answer! It helps, but still leaves me with few >> more questions. If you don't mind, I'd like to ask you few more >> questions. >> >> When you said "It can be used, and is used in user code, but it isn't >> always as straightforward as you might think." did you think about the >> Spark code or some other user code? Can I have a look at the code and >> the use case? The method is `private[spark]` and it's not even >> @DeveloperApi that makes using the method even more risky. I believe >> it's a very low-level ingredient of Spark that very few people use if >> at all. If I could see the code that uses the method, that could help. >> >> Following up, isn't killing a stage similar to killing a job? They can >> both be shared and I could imagine a very similar case for killing a >> job as for a stage where an implementation does some checks before >> killing the job eventually. It is possible for stages that are in a >> sense similar to jobs so...I'm still unsure why the method is not used >> by Spark itself. If it's not used by Spark why could it be useful for >> others outside Spark? >> >> Doh, why did I come across the method? It will take some time before I >> forget about it :-) >> >> Pozdrawiam, >> Jacek >> >> -- >> Jacek Laskowski | https://medium.com/@jaceklaskowski/ | >> http://blog.jaceklaskowski.pl >> Mastering Spark https://jaceklaskowski.gitbooks.io/mastering-apache-spark/ >> Follow me at https://twitter.com/jaceklaskowski >> Upvote at http://stackoverflow.com/users/1305344/jacek-laskowski >> >> >> On Wed, Dec 16, 2015 at 10:55 AM, Mark Hamstra <m...@clearstorydata.com> >> wrote: >> > It can be used, and is used in user code, but it isn't always as >> > straightforward as you might think. This is mostly because a Job often >> > isn't a Job -- or rather it is more than one Job. There are several RDD >> > transformations that aren't lazy, so they end up launching "hidden" Jobs >> > that you may not anticipate and may expect to be canceled (but won't be) >> > by >> > a cancelJob() called on a later action on that transformed RDD. It is >> > also >> > possible for a single DataFrame or Spark SQL query to result in more >> > than >> > one running Job. The upshot of all of this is that getting cancelJob() >> > to >> > work as most users would expect all the time is non-trivial, and most of >> > the >> > time using a jobGroup is a better way to capture what may be more than >> > one >> > Job that the user is thinking of as a single Job. >> > >> > On Wed, Dec 16, 2015 at 5:34 AM, Sean Owen <so...@cloudera.com> wrote: >> >> >> >> It does look like it's not actually used. It may simply be there for >> >> completeness, to match cancelStage and cancelJobGroup, which are used. >> >> I also don't know of a good reason there's no way to kill a whole job. >> >> >> >> On Wed, Dec 16, 2015 at 1:15 PM, Jacek Laskowski <ja...@japila.pl> >> >> wrote: >> >> > Hi, >> >> > >> >> > While reviewing Spark code I came across SparkContext.cancelJob. I >> >> > found no part of Spark using it. Is this a leftover after some >> >> > refactoring? Why is this part of sc? >> >> > >> >> > The reason I'm asking is another question I'm having after having >> >> > learnt about killing a stage in webUI. I noticed there is a way to >> >> > kill/cancel stages, but no corresponding feature to kill/cancel jobs. >> >> > Why? Is there a JIRA ticket to have it some day perhaps? >> >> > >> >> > Pozdrawiam, >> >> > Jacek >> >> > >> >> > -- >> >> > Jacek Laskowski | https://medium.com/@jaceklaskowski/ >> >> > Mastering Apache Spark >> >> > ==> https://jaceklaskowski.gitbooks.io/mastering-apache-spark/ >> >> > Follow me at https://twitter.com/jaceklaskowski >> >> > >> >> > --------------------------------------------------------------------- >> >> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> >> > For additional commands, e-mail: user-h...@spark.apache.org >> >> > >> >> >> >> --------------------------------------------------------------------- >> >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> >> For additional commands, e-mail: user-h...@spark.apache.org >> >> >> > > > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org