Re: Dropping support for earlier Hadoop versions in Spark 2.0?

Steve Loughran Sat, 21 Nov 2015 11:48:40 -0800

> On 20 Nov 2015, at 21:39, Reynold Xin <r...@databricks.com> wrote:
> 
> OK I'm not exactly asking for a vote here :)
> 
> I don't think we should look at it from only maintenance point of view -- 
> because in that case the answer is clearly supporting as few versions as 
> possible (or just rm -rf spark source code and call it a day). It is a 
> tradeoff between the number of users impacted and the maintenance burden.
> 
> So a few questions for those more familiar with Hadoop:
> 
> 1. Can Hadoop 2.6 client read Hadoop 2.4 / 2.3? 
>


yes, at HDFS 

There's some special cases with HDFS stopping a 2.2-2.5 client talking to 
Hadoop 2.6


-HDFS at rest encryption needs a client that can decode it (2.6.x+)
-HDFS erasure code will need a later version (2.8?)

If you turn SASL on in your datanodes, your DNs don't need to come up on a port 
< 1024, but Hadoop  < 2.6 clients stop being able to work with HDFS at that 
point



> 2. If the answer to 1 is yes, are there known, major issues with backward 
> compatibility?
> 

hadoop native libs, every time. Guava, jackson and protobuf can be managed with 
shading, but hadoop.{so,dll} is a real problem. A hadoop-2.6 JAR will use 
native methods in hadoop.lib which, if not loaded, will break the app.  This is 
a pain as nobody includes that native lib with their java binaries —who can 
even predict which one they have to do. As a consequence, I'd really advise 
against trying to run an app built with the 2.6 JARS inside a YARN cluster  < 
2.6. You can certainly talk to HDFS and the YARN services, but there's a risk a 
codepath will hit a native method that isn't there.


It's trouble the other way too.  -even though we try not break existing code by 
moving/renaming native methods it can happen.

The last time someone did this in a big way, I was the first to find it in 
HADOOP-11064; the changes where reverted/altered but there was no official 
declaration that compatibility at the JNI layer will be maintained. Apparently 
you can't guarantee it over JVM versions either.

We really need a lib versioning story, which is what HADOOP-11127 covers.

> 3. Can Hadoop 2.6+ YARN work on older versions of YARN clusters?
> 

I'd say no, with classpath and hadoop native being the failure points.

There's also feature completeness; Hadoop 2.6 was the first version with all 
the YARN-896 work for long-lived services


> 4. (for Hadoop vendors) When did/will support for Hadoop 2.4 and below stop? 
> To what extent do you care about running Spark on older Hadoop clusters.
> 
> 

I don't know. And I probably don't want to make any forward looking statements 
anyway. But I don't even know how well supported 2.4 is today; 2.6 is the one 
that still gets bug fixes out from the ASF. I can see it lasting a while.


What essentially happens is that we provide bug fixes to the existing releases, 
but for anything new: upgrade.

Assuming that policy continues (disclaimer: personal opinions, etc), then any 
Spark 2.0 release would be rebuilt against all the JARs which the rest of that 
version of HDP would use, and that's the only version we'd recommend using.



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: Dropping support for earlier Hadoop versions in Spark 2.0?

Reply via email to