Re: Request for comments: Java 7 removal

2017-02-13 Thread Charles Allen
I think the biggest concern is enterprise users/operators who do not have
the authority or access to upgrade hadoop/yarn clusters to java8. As a
reference point, apparently CDH 5.3

 shipped with java 8 in December 2014. I would be surprised if such users
were active consumers of the dev mailing list, though. Unfortunately
there's a bit of a selection bias in this list.

The other concern is if there is guaranteed compatibility between scala and
java8 for all versions you want to use (which is somewhat touched upon in
the PR). Are you thinking about supporting scala 2.10 against java 8 byte
code?

See https://groups.google.com/d/msg/druid-user/aTGQlnF1KLk/NvBPfmigAAAJ for
the similar discussion that went forward in the Druid community.


On Fri, Feb 10, 2017 at 8:47 AM Sean Owen  wrote:

> As you have seen, there's a WIP PR to implement removal of Java 7 support:
> https://github.com/apache/spark/pull/16871
>
> I have heard several +1s at
> https://issues.apache.org/jira/browse/SPARK-19493 but am asking for
> concerns too, now that there's a concrete change to review.
>
> If this goes in for 2.2 it can be followed by more extensive update of the
> Java code to take advantage of Java 8; this is more or less the baseline
> change.
>
> We also just removed Hadoop 2.5 support. I know there was talk about
> removing Python 2.6. I have no opinion on that myself, but, might be time
> to revive that conversation too.
>


Mesos checkpointing

2017-04-03 Thread Charles Allen
As per https://issues.apache.org/jira/browse/SPARK-4899
org.apache.spark.scheduler.cluster.mesos.MesosSchedulerUtils#createSchedulerDriver
allows
checkpointing, but
only org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler uses
it. Is there a reason for that?


Re: Mesos checkpointing

2017-04-03 Thread Charles Allen
We had investigated internally recently why restarting the mesos agents
failed the spark jobs (no real reason they should, right?) and came across
the data. The other conversation by Yu sparked trying to poke to get some
of the tickets updated to spread around any tribal knowledge that is
floating in the community.

It sounds like the only thing keeping it from being enabled is a timeout
config and someone volunteering to do some testing?


On Mon, Apr 3, 2017 at 2:19 PM Timothy Chen  wrote:

> The only reason is that MesosClusterScheduler by design is long
> running so we really needed it to have failover configured correctly.
>
> I wanted to create a JIRA ticket to allow users to configure it for
> each Spark framework, but just didn't remember to do so.
>
> Per another question that came up in the mailing list, I believe we
> should add it as it's a fairly straight forward effort.
>
> Tim
>
> On Mon, Apr 3, 2017 at 2:16 PM, Charles Allen
>  wrote:
> > As per https://issues.apache.org/jira/browse/SPARK-4899
> >
> org.apache.spark.scheduler.cluster.mesos.MesosSchedulerUtils#createSchedulerDriver
> > allows checkpointing, but only
> > org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler uses it.
> Is
> > there a reason for that?
>


Re: Mesos checkpointing

2017-05-24 Thread Charles Allen
The issue on our side is we tend to roll out a bunch of agent updates at
about the same time. So rolling an agent, then waiting for spark jobs to
recover, then rolling another agent is not at all practical. It is a huge
benefit if we can just update the agents in bulk (or even sequentially, but
only waiting for the mesos agent to recover).

On Wed, May 24, 2017 at 11:17 AM Michael Gummelt 
wrote:

> > We had investigated internally recently why restarting the mesos agents
> failed the spark jobs (no real reason they should, right?) and came across
> the data.
>
> Restarting the agent without checkpointing enabled will kill the executor,
> but that still shouldn't cause the Spark job to fail, since Spark jobs
> should tolerate executor failures.
>
> On Mon, Apr 3, 2017 at 2:26 PM, Timothy Chen  wrote:
>
>> Yes, adding the timeout config should be the only code change required.
>>
>> And just to clarify, this is for reconnecting with Mesos master (not
>> agents) after failover.
>>
>> Tim
>>
>> On Mon, Apr 3, 2017 at 2:23 PM, Charles Allen
>>  wrote:
>> > We had investigated internally recently why restarting the mesos agents
>> > failed the spark jobs (no real reason they should, right?) and came
>> across
>> > the data. The other conversation by Yu sparked trying to poke to get
>> some of
>> > the tickets updated to spread around any tribal knowledge that is
>> floating
>> > in the community.
>> >
>> > It sounds like the only thing keeping it from being enabled is a timeout
>> > config and someone volunteering to do some testing?
>> >
>> >
>> > On Mon, Apr 3, 2017 at 2:19 PM Timothy Chen  wrote:
>> >>
>> >> The only reason is that MesosClusterScheduler by design is long
>> >> running so we really needed it to have failover configured correctly.
>> >>
>> >> I wanted to create a JIRA ticket to allow users to configure it for
>> >> each Spark framework, but just didn't remember to do so.
>> >>
>> >> Per another question that came up in the mailing list, I believe we
>> >> should add it as it's a fairly straight forward effort.
>> >>
>> >> Tim
>> >>
>> >> On Mon, Apr 3, 2017 at 2:16 PM, Charles Allen
>> >>  wrote:
>> >> > As per https://issues.apache.org/jira/browse/SPARK-4899
>> >> >
>> >> >
>> org.apache.spark.scheduler.cluster.mesos.MesosSchedulerUtils#createSchedulerDriver
>> >> > allows checkpointing, but only
>> >> > org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler uses
>> it.
>> >> > Is
>> >> > there a reason for that?
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>
>
>
> --
> Michael Gummelt
> Software Engineer
> Mesosphere
>