I think I mean the job that Mark is talking about but that's also the thing that's being stopped by the dcos command and (hopefully) the thing that's being stopped by the dispatcher, isn't it?
It would be really good if the issue (SPARK-17064) would be resolved, but for now I'll do with cancelling the planned tasks in the current job (that's already a lot better than completing the whole job). Thanks anyway for the answers, you helped me a lot, kind regards, Richard On Wed, Oct 5, 2016 at 11:38 PM, Michael Gummelt <mgumm...@mesosphere.io> wrote: > You're using the proper Spark definition of "job", but I believe Richard > means "driver". > > On Wed, Oct 5, 2016 at 2:17 PM, Mark Hamstra <m...@clearstorydata.com> > wrote: > >> Yes and no. Something that you need to be aware of is that a Job as such >> exists in the DAGScheduler as part of the Application running on the >> Driver. When talking about stopping or killing a Job, however, what people >> often mean is not just stopping the DAGScheduler from telling the Executors >> to run more Tasks associated with the Job, but also to stop any associated >> Tasks that are already running on Executors. That is something that Spark >> doesn't try to do by default, and changing that behavior has been an open >> issue for a long time -- cf. SPARK-17064 >> >> On Wed, Oct 5, 2016 at 2:07 PM, Michael Gummelt <mgumm...@mesosphere.io> >> wrote: >> >>> If running in client mode, just kill the job. If running in cluster >>> mode, the Spark Dispatcher exposes an HTTP API for killing jobs. I don't >>> think this is externally documented, so you might have to check the code to >>> find this endpoint. If you run in dcos, you can just run "dcos spark kill >>> <id>". >>> >>> You can also find which node is running the driver, ssh in, and kill the >>> process. >>> >>> On Wed, Oct 5, 2016 at 1:55 PM, Richard Siebeling <rsiebel...@gmail.com> >>> wrote: >>> >>>> Hi, >>>> >>>> how can I stop a long running job? >>>> >>>> We're having Spark running in Mesos Coarse-grained mode. Suppose the >>>> user start a long running job, makes a mistake, changes a transformation >>>> and runs the job again. In this case I'd like to cancel the first job and >>>> after that start the second job. It would be a waste of resources to finish >>>> the first job (which could possibly take several hours...) >>>> >>>> How can this be accomplished? >>>> thanks in advance, >>>> Richard >>>> >>>> >>> >>> >>> -- >>> Michael Gummelt >>> Software Engineer >>> Mesosphere >>> >> >> > > > -- > Michael Gummelt > Software Engineer > Mesosphere >