Re: Mesos checkpointing

Charles Allen Mon, 03 Apr 2017 14:24:11 -0700

We had investigated internally recently why restarting the mesos agents
failed the spark jobs (no real reason they should, right?) and came across
the data. The other conversation by Yu sparked trying to poke to get some
of the tickets updated to spread around any tribal knowledge that is
floating in the community.


It sounds like the only thing keeping it from being enabled is a timeout
config and someone volunteering to do some testing?


On Mon, Apr 3, 2017 at 2:19 PM Timothy Chen <tnac...@gmail.com> wrote:

> The only reason is that MesosClusterScheduler by design is long
> running so we really needed it to have failover configured correctly.
>
> I wanted to create a JIRA ticket to allow users to configure it for
> each Spark framework, but just didn't remember to do so.
>
> Per another question that came up in the mailing list, I believe we
> should add it as it's a fairly straight forward effort.
>
> Tim
>
> On Mon, Apr 3, 2017 at 2:16 PM, Charles Allen
> <charles.al...@metamarkets.com> wrote:
> > As per https://issues.apache.org/jira/browse/SPARK-4899
> >
> org.apache.spark.scheduler.cluster.mesos.MesosSchedulerUtils#createSchedulerDriver
> > allows checkpointing, but only
> > org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler uses it.
> Is
> > there a reason for that?
>

Re: Mesos checkpointing

Reply via email to