On 24 Mar 2016, at 15:27, Koert Kuipers 
<ko...@tresata.com<mailto:ko...@tresata.com>> wrote:

i think the arguments are convincing, but it also makes me wonder if i live in 
some kind of alternate universe... we deploy on customers clusters, where the 
OS, python version, java version and hadoop distro are not chosen by us. so 
think centos 6, cdh5 or hdp 2.3, java 7 and python 2.6. we simply have access 
to a single proxy machine and launch through yarn. asking them to upgrade java 
is pretty much out of the question or a 6+ month ordeal. of the 10 client 
clusters i can think of on the top of my head all of them are on java 7, none 
are on java 8. so by doing this you would make spark 2 basically unusable for 
us (unless most of them have plans of upgrading in near term to java 8, i will 
ask around and report back...).


It's not actually mandatory for the process executing in the Yarn cluster to 
run with the same JVM as the rest of the Hadoop stack; all that is needed is 
for the environment variables to set up the JAVA_HOME and PATH. Switching JVMs 
not something which YARN makes it easy to do, but it may be possible, 
especially if Spark itself provides some hooks, so you don't have to manually 
lay with setting things up. That may be something which could significantly 
ease adoption of Spark 2 in YARN clusters. Same for Python.

This is something I could probably help others to address

Reply via email to