thats correct in my experience: we have found a scala update to be straightforward and basically somewhat invisible to ops, but a java upgrade a pain because it is managed and "certified" by ops.
On Fri, Oct 28, 2016 at 9:44 AM, Steve Loughran <ste...@hortonworks.com> wrote: > Twitter just led the release of Hadoop 2.6.5 precisely because they wanted > to keep a Java 6 cluster up: the bigger your cluster, the less of a rush to > upgrade. > > HDP? I believe we install & prefer (openjdk) Java 8, but the Hadoop > branch-2 line is intended to build/run on Java 7 too. There's always a > conflict between us developers "shiny new features" and ops "keep cluster > alive". That's actually where Scala has an edge: no need to upgrade the > cluster-wide JVM just for an update, or play games configuring your > deployed application to use a different JVM from the Hadoop services (which > you can do, after all: it's just path setup). Thinking about it, knowing > what can be done there —including documenting it in the spark docs, could > be a good migration strategy. > > me? I look forward to when we can use Java 9 to isolate transitive > dependencies; the bane of everyone's life. Someone needs to start on > preparing everything for that to work though. > > > On 28 Oct 2016, at 11:47, Chris Fregly <ch...@fregly.com> wrote: > > i seem to remember a large spark user (tencent, i believe) chiming in late > during these discussions 6-12 months ago and squashing any sort of > deprecation given the massive effort that would be required to upgrade > their environment. > > i just want to make sure these convos take into consideration large spark > users - and reflect the real world versus ideal world. > > otherwise, this is all for naught like last time. > > On Oct 28, 2016, at 10:43 AM, Sean Owen <so...@cloudera.com> wrote: > > If the subtext is vendors, then I'd have a look at what recent distros > look like. I'll write about CDH as a representative example, but I think > other distros are naturally similar. > > CDH has been on Java 8, Hadoop 2.6, Python 2.7 for almost two years (CDH > 5.3 / Dec 2014). Granted, this depends on installing on an OS with that > Java / Python version. But Java 8 / Python 2.7 is available for all of the > supported OSes. The population that isn't on CDH 4, because that supported > was dropped a long time ago in Spark, and who is on a version released > 2-2.5 years ago, and won't update, is a couple percent of the installed > base. They do not in general want anything to change at all. > > I assure everyone that vendors too are aligned in wanting to cater to the > crowd that wants the most recent version of everything. For example, CDH > offers both Spark 2.0.1 and 1.6 at the same time. > > I wouldn't dismiss support for these supporting components as a relevant > proxy for whether they are worth supporting in Spark. Java 7 is long since > EOL (no, I don't count paying Oracle for support). No vendor is supporting > Hadoop < 2.6. Scala 2.10 was EOL at the end of 2014. Is there a criteria > here that reaches a different conclusion about these things just for Spark? > This was roughly the same conversation that happened 6 months ago. > > I imagine we're going to find that in about 6 months it'll make more sense > all around to remove these. If we can just give a heads up with deprecation > and then kick the can down the road a bit more, that sounds like enough for > now. > > On Fri, Oct 28, 2016 at 8:58 AM Matei Zaharia <matei.zaha...@gmail.com> > wrote: > >> Deprecating them is fine (and I know they're already deprecated), the >> question is just whether to remove them. For example, what exactly is the >> downside of having Python 2.6 or Java 7 right now? If it's high, then we >> can remove them, but I just haven't seen a ton of details. It also sounded >> like fairly recent versions of CDH, HDP, RHEL, etc still have old versions >> of these. >> >> Just talking with users, I've seen many of people who say "we have a >> Hadoop cluster from $VENDOR, but we just download Spark from Apache and run >> newer versions of that". That's great for Spark IMO, and we need to stay >> compatible even with somewhat older Hadoop installs because they are >> time-consuming to update. Having the whole community on a small set of >> versions leads to a better experience for everyone and also to more of a >> "network effect": more people can battle-test new versions, answer >> questions about them online, write libraries that easily reach the majority >> of Spark users, etc. >> > >