One reason I wouldn't change the default is that the Hadoop 2 launched by spark-ec2 is not a full Hadoop 2 distribution -- Its more of a hybrid Hadoop version built using CDH4 (it uses HDFS 2, but not YARN AFAIK).
Also our default Hadoop version in the Spark build is still 1.0.4 [1], so it makes sense to stick to that in spark-ec2 as well ? [1] https://github.com/apache/spark/blob/master/pom.xml#L122 Thanks Shivaram On Sun, Mar 1, 2015 at 2:59 PM, Nicholas Chammas <nicholas.cham...@gmail.com > wrote: > > https://github.com/apache/spark/blob/fd8d283eeb98e310b1e85ef8c3a8af9e547ab5e0/ec2/spark_ec2.py#L162-L164 > > Is there any reason we shouldn't update the default Hadoop major version in > spark-ec2 to 2? > > Nick >