i seem to remember a large spark user (tencent, i believe) chiming in late 
during these discussions 6-12 months ago and squashing any sort of deprecation 
given the massive effort that would be required to upgrade their environment.

i just want to make sure these convos take into consideration large spark users 
- and reflect the real world versus ideal world.

otherwise, this is all for naught like last time.

> On Oct 28, 2016, at 10:43 AM, Sean Owen <so...@cloudera.com> wrote:
> 
> If the subtext is vendors, then I'd have a look at what recent distros look 
> like. I'll write about CDH as a representative example, but I think other 
> distros are naturally similar.
> 
> CDH has been on Java 8, Hadoop 2.6, Python 2.7 for almost two years (CDH 5.3 
> / Dec 2014). Granted, this depends on installing on an OS with that Java / 
> Python version. But Java 8 / Python 2.7 is available for all of the supported 
> OSes. The population that isn't on CDH 4, because that supported was dropped 
> a long time ago in Spark, and who is on a version released 2-2.5 years ago, 
> and won't update, is a couple percent of the installed base. They do not in 
> general want anything to change at all.
> 
> I assure everyone that vendors too are aligned in wanting to cater to the 
> crowd that wants the most recent version of everything. For example, CDH 
> offers both Spark 2.0.1 and 1.6 at the same time.
> 
> I wouldn't dismiss support for these supporting components as a relevant 
> proxy for whether they are worth supporting in Spark. Java 7 is long since 
> EOL (no, I don't count paying Oracle for support). No vendor is supporting 
> Hadoop < 2.6. Scala 2.10 was EOL at the end of 2014. Is there a criteria here 
> that reaches a different conclusion about these things just for Spark? This 
> was roughly the same conversation that happened 6 months ago.
> 
> I imagine we're going to find that in about 6 months it'll make more sense 
> all around to remove these. If we can just give a heads up with deprecation 
> and then kick the can down the road a bit more, that sounds like enough for 
> now.
> 
>> On Fri, Oct 28, 2016 at 8:58 AM Matei Zaharia <matei.zaha...@gmail.com> 
>> wrote:
>> Deprecating them is fine (and I know they're already deprecated), the 
>> question is just whether to remove them. For example, what exactly is the 
>> downside of having Python 2.6 or Java 7 right now? If it's high, then we can 
>> remove them, but I just haven't seen a ton of details. It also sounded like 
>> fairly recent versions of CDH, HDP, RHEL, etc still have old versions of 
>> these.
>> 
>> Just talking with users, I've seen many of people who say "we have a Hadoop 
>> cluster from $VENDOR, but we just download Spark from Apache and run newer 
>> versions of that". That's great for Spark IMO, and we need to stay 
>> compatible even with somewhat older Hadoop installs because they are 
>> time-consuming to update. Having the whole community on a small set of 
>> versions leads to a better experience for everyone and also to more of a 
>> "network effect": more people can battle-test new versions, answer questions 
>> about them online, write libraries that easily reach the majority of Spark 
>> users, etc.

Reply via email to