Re: Utilize newer hadoop releases WAS: [VOTE] Release Apache Spark 1.0.2 (RC1)

2014-07-28 Thread Sean Owen
Right, the scenario is, for example, that a class is added in release 2.5.0, but has been back-ported to a 2.4.1-based release. 2.4.1 isn't missing anything from 2.4.1. But a version of "2.4.1" doesn't tell you whether or not the class is there reliably. By the way, I just found there is already s

Re: Utilize newer hadoop releases WAS: [VOTE] Release Apache Spark 1.0.2 (RC1)

2014-07-27 Thread Matei Zaharia
We could also do this, though it would be great if the Hadoop project provided this version number as at least a baseline. It's up to distributors to decide which version they report but I imagine they won't remove stuff that's in the reported version number. Matei On Jul 27, 2014, at 1:57 PM,

Re: Utilize newer hadoop releases WAS: [VOTE] Release Apache Spark 1.0.2 (RC1)

2014-07-27 Thread Sean Owen
Good idea, although it gets difficult in the context of multiple distributions. Say change X is not present in version A, but present in version B. If you depend on X, what version can you look for to detect it? The distribution will return "A" or "A+X" or somesuch, but testing for "A" will give an

Re: Utilize newer hadoop releases WAS: [VOTE] Release Apache Spark 1.0.2 (RC1)

2014-07-27 Thread Matei Zaharia
For this particular issue, it would be good to know if Hadoop provides an API to determine the Hadoop version. If not, maybe that can be added to Hadoop in its next release, and we can check for it with reflection. We recently added a SparkContext.version() method in Spark to let you tell the ve

Re: Utilize newer hadoop releases WAS: [VOTE] Release Apache Spark 1.0.2 (RC1)

2014-07-27 Thread Patrick Wendell
Hey Ted, We always intend Spark to work with the newer Hadoop versions and encourage Spark users to use the newest Hadoop versions for best performance. We do try to be liberal in terms of supporting older versions as well. This is because many people run older HDFS versions and we want Spark to

Utilize newer hadoop releases WAS: [VOTE] Release Apache Spark 1.0.2 (RC1)

2014-07-27 Thread Ted Yu
Thanks for replying, Patrick. The intention of my first email was for utilizing newer hadoop releases for their bug fixes. I am still looking for clean way of passing hadoop release version number to individual classes. Using newer hadoop releases would encourage pushing bug fixes / new features u