i seem to remember a large spark user (tencent, i believe) chiming in late during these discussions 6-12 months ago and squashing any sort of deprecation given the massive effort that would be required to upgrade their environment.
i just want to make sure these convos take into consideration large spark users - and reflect the real world versus ideal world. otherwise, this is all for naught like last time. > On Oct 28, 2016, at 10:43 AM, Sean Owen <so...@cloudera.com> wrote: > > If the subtext is vendors, then I'd have a look at what recent distros look > like. I'll write about CDH as a representative example, but I think other > distros are naturally similar. > > CDH has been on Java 8, Hadoop 2.6, Python 2.7 for almost two years (CDH 5.3 > / Dec 2014). Granted, this depends on installing on an OS with that Java / > Python version. But Java 8 / Python 2.7 is available for all of the supported > OSes. The population that isn't on CDH 4, because that supported was dropped > a long time ago in Spark, and who is on a version released 2-2.5 years ago, > and won't update, is a couple percent of the installed base. They do not in > general want anything to change at all. > > I assure everyone that vendors too are aligned in wanting to cater to the > crowd that wants the most recent version of everything. For example, CDH > offers both Spark 2.0.1 and 1.6 at the same time. > > I wouldn't dismiss support for these supporting components as a relevant > proxy for whether they are worth supporting in Spark. Java 7 is long since > EOL (no, I don't count paying Oracle for support). No vendor is supporting > Hadoop < 2.6. Scala 2.10 was EOL at the end of 2014. Is there a criteria here > that reaches a different conclusion about these things just for Spark? This > was roughly the same conversation that happened 6 months ago. > > I imagine we're going to find that in about 6 months it'll make more sense > all around to remove these. If we can just give a heads up with deprecation > and then kick the can down the road a bit more, that sounds like enough for > now. > >> On Fri, Oct 28, 2016 at 8:58 AM Matei Zaharia <matei.zaha...@gmail.com> >> wrote: >> Deprecating them is fine (and I know they're already deprecated), the >> question is just whether to remove them. For example, what exactly is the >> downside of having Python 2.6 or Java 7 right now? If it's high, then we can >> remove them, but I just haven't seen a ton of details. It also sounded like >> fairly recent versions of CDH, HDP, RHEL, etc still have old versions of >> these. >> >> Just talking with users, I've seen many of people who say "we have a Hadoop >> cluster from $VENDOR, but we just download Spark from Apache and run newer >> versions of that". That's great for Spark IMO, and we need to stay >> compatible even with somewhat older Hadoop installs because they are >> time-consuming to update. Having the whole community on a small set of >> versions leads to a better experience for everyone and also to more of a >> "network effect": more people can battle-test new versions, answer questions >> about them online, write libraries that easily reach the majority of Spark >> users, etc.