thats correct in my experience: we have found a scala update to be
straightforward and basically somewhat invisible to ops, but a java upgrade
a pain because it is managed and "certified" by ops.

On Fri, Oct 28, 2016 at 9:44 AM, Steve Loughran <ste...@hortonworks.com>
wrote:

> Twitter just led the release of Hadoop 2.6.5 precisely because they wanted
> to keep a Java 6 cluster up: the bigger your cluster, the less of a rush to
> upgrade.
>
> HDP? I believe we install & prefer (openjdk) Java 8, but the Hadoop
> branch-2 line is intended to build/run on Java 7 too. There's always a
> conflict between us developers "shiny new features" and ops "keep cluster
> alive". That's actually where Scala has an edge: no need to upgrade the
> cluster-wide JVM just for an update, or play games configuring your
> deployed application to use a different JVM from the Hadoop services (which
> you can do, after all: it's just path setup). Thinking about it, knowing
> what can be done there —including documenting it in the spark docs, could
> be a good migration strategy.
>
> me? I look forward to when we can use Java 9 to isolate transitive
> dependencies; the bane of everyone's life. Someone needs to start on
> preparing everything for that to work though.
>
>
> On 28 Oct 2016, at 11:47, Chris Fregly <ch...@fregly.com> wrote:
>
> i seem to remember a large spark user (tencent, i believe) chiming in late
> during these discussions 6-12 months ago and squashing any sort of
> deprecation given the massive effort that would be required to upgrade
> their environment.
>
> i just want to make sure these convos take into consideration large spark
> users - and reflect the real world versus ideal world.
>
> otherwise, this is all for naught like last time.
>
> On Oct 28, 2016, at 10:43 AM, Sean Owen <so...@cloudera.com> wrote:
>
> If the subtext is vendors, then I'd have a look at what recent distros
> look like. I'll write about CDH as a representative example, but I think
> other distros are naturally similar.
>
> CDH has been on Java 8, Hadoop 2.6, Python 2.7 for almost two years (CDH
> 5.3 / Dec 2014). Granted, this depends on installing on an OS with that
> Java / Python version. But Java 8 / Python 2.7 is available for all of the
> supported OSes. The population that isn't on CDH 4, because that supported
> was dropped a long time ago in Spark, and who is on a version released
> 2-2.5 years ago, and won't update, is a couple percent of the installed
> base. They do not in general want anything to change at all.
>
> I assure everyone that vendors too are aligned in wanting to cater to the
> crowd that wants the most recent version of everything. For example, CDH
> offers both Spark 2.0.1 and 1.6 at the same time.
>
> I wouldn't dismiss support for these supporting components as a relevant
> proxy for whether they are worth supporting in Spark. Java 7 is long since
> EOL (no, I don't count paying Oracle for support). No vendor is supporting
> Hadoop < 2.6. Scala 2.10 was EOL at the end of 2014. Is there a criteria
> here that reaches a different conclusion about these things just for Spark?
> This was roughly the same conversation that happened 6 months ago.
>
> I imagine we're going to find that in about 6 months it'll make more sense
> all around to remove these. If we can just give a heads up with deprecation
> and then kick the can down the road a bit more, that sounds like enough for
> now.
>
> On Fri, Oct 28, 2016 at 8:58 AM Matei Zaharia <matei.zaha...@gmail.com>
> wrote:
>
>> Deprecating them is fine (and I know they're already deprecated), the
>> question is just whether to remove them. For example, what exactly is the
>> downside of having Python 2.6 or Java 7 right now? If it's high, then we
>> can remove them, but I just haven't seen a ton of details. It also sounded
>> like fairly recent versions of CDH, HDP, RHEL, etc still have old versions
>> of these.
>>
>> Just talking with users, I've seen many of people who say "we have a
>> Hadoop cluster from $VENDOR, but we just download Spark from Apache and run
>> newer versions of that". That's great for Spark IMO, and we need to stay
>> compatible even with somewhat older Hadoop installs because they are
>> time-consuming to update. Having the whole community on a small set of
>> versions leads to a better experience for everyone and also to more of a
>> "network effect": more people can battle-test new versions, answer
>> questions about them online, write libraries that easily reach the majority
>> of Spark users, etc.
>>
>
>

Reply via email to