Right, the scenario is, for example, that a class is added in release 2.5.0, but has been back-ported to a 2.4.1-based release. 2.4.1 isn't missing anything from 2.4.1. But a version of "2.4.1" doesn't tell you whether or not the class is there reliably.
By the way, I just found there is already such a class, org.apache.hadoop.util.VersionInfo: https://github.com/apache/hadoop-common/blob/release-2.4.1/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/VersionInfo.java It appears to have been around for a long time. Theoretical problems aside, there may be cases where querying the version is a fine and reliable solution. On Jul 28, 2014 12:54 AM, "Matei Zaharia" <matei.zaha...@gmail.com> wrote: > > We could also do this, though it would be great if the Hadoop project > provided this version number as at least a baseline. It's up to distributors > to decide which version they report but I imagine they won't remove stuff > that's in the reported version number. > > Matei > > On Jul 27, 2014, at 1:57 PM, Sean Owen <so...@cloudera.com> wrote: > > > Good idea, although it gets difficult in the context of multiple > > distributions. Say change X is not present in version A, but present > > in version B. If you depend on X, what version can you look for to > > detect it? The distribution will return "A" or "A+X" or somesuch, but > > testing for "A" will give an incorrect answer, and the code can't be > > expected to look for everyone's "A+X" versions. Actually inspecting > > the code is more robust if a bit messier. > > > > On Sun, Jul 27, 2014 at 9:50 PM, Matei Zaharia <matei.zaha...@gmail.com> > > wrote: > >> For this particular issue, it would be good to know if Hadoop provides an > >> API to determine the Hadoop version. If not, maybe that can be added to > >> Hadoop in its next release, and we can check for it with reflection. We > >> recently added a SparkContext.version() method in Spark to let you tell > >> the version. > >> > >> Matei > >> > >> On Jul 27, 2014, at 12:19 PM, Patrick Wendell <pwend...@gmail.com> wrote: > >> > >>> Hey Ted, > >>> > >>> We always intend Spark to work with the newer Hadoop versions and > >>> encourage Spark users to use the newest Hadoop versions for best > >>> performance. > >>> > >>> We do try to be liberal in terms of supporting older versions as well. > >>> This is because many people run older HDFS versions and we want Spark > >>> to read and write data from them. So far we've been willing to do this > >>> despite some maintenance cost. > >>> > >>> The reason is that for many users it's very expensive to do a > >>> whole-sale upgrade of HDFS, but trying out new versions of Spark is > >>> much easier. For instance, some of the largest scale Spark users run > >>> fairly old or forked HDFS versions. > >>> > >>> - Patrick > >>> > >>> On Sun, Jul 27, 2014 at 12:01 PM, Ted Yu <yuzhih...@gmail.com> wrote: > >>>> Thanks for replying, Patrick. > >>>> > >>>> The intention of my first email was for utilizing newer hadoop releases > >>>> for > >>>> their bug fixes. I am still looking for clean way of passing hadoop > >>>> release > >>>> version number to individual classes. > >>>> Using newer hadoop releases would encourage pushing bug fixes / new > >>>> features upstream. Ultimately Spark code would become cleaner. > >>>> > >>>> Cheers > >>>> > >>>> On Sun, Jul 27, 2014 at 8:52 AM, Patrick Wendell <pwend...@gmail.com> > >>>> wrote: > >>>> > >>>>> Ted - technically I think you are correct, although I wouldn't > >>>>> recommend disabling this lock. This lock is not expensive (acquired > >>>>> once per task, as are many other locks already). Also, we've seen some > >>>>> cases where Hadoop concurrency bugs ended up requiring multiple fixes > >>>>> - concurrency of client access is not well tested in the Hadoop > >>>>> codebase since most of the Hadoop tools to not use concurrent access. > >>>>> So in general it's good to be conservative in what we expect of the > >>>>> Hadoop client libraries. > >>>>> > >>>>> If you'd like to discuss this further, please fork a new thread, since > >>>>> this is a vote thread. Thanks! > >>>>> > >>>>> On Fri, Jul 25, 2014 at 10:14 PM, Ted Yu <yuzhih...@gmail.com> wrote: > >>>>>> HADOOP-10456 is fixed in hadoop 2.4.1 > >>>>>> > >>>>>> Does this mean that synchronization > >>>>>> on HadoopRDD.CONFIGURATION_INSTANTIATION_LOCK can be bypassed for > >>>>>> hadoop > >>>>>> 2.4.1 ? > >>>>>> > >>>>>> Cheers > >>>>>> > >>>>>> > >>>>>> On Fri, Jul 25, 2014 at 6:00 PM, Patrick Wendell <pwend...@gmail.com> > >>>>> wrote: > >>>>>> > >>>>>>> The most important issue in this release is actually an ammendment to > >>>>>>> an earlier fix. The original fix caused a deadlock which was a > >>>>>>> regression from 1.0.0->1.0.1: > >>>>>>> > >>>>>>> Issue: > >>>>>>> https://issues.apache.org/jira/browse/SPARK-1097 > >>>>>>> > >>>>>>> 1.0.1 Fix: > >>>>>>> https://github.com/apache/spark/pull/1273/files (had a deadlock) > >>>>>>> > >>>>>>> 1.0.2 Fix: > >>>>>>> https://github.com/apache/spark/pull/1409/files > >>>>>>> > >>>>>>> I failed to correctly label this on JIRA, but I've updated it! > >>>>>>> > >>>>>>> On Fri, Jul 25, 2014 at 5:35 PM, Michael Armbrust > >>>>>>> <mich...@databricks.com> wrote: > >>>>>>>> That query is looking at "Fix Version" not "Target Version". The > >>>>>>>> fact > >>>>>>> that > >>>>>>>> the first one is still open is only because the bug is not resolved > >>>>>>>> in > >>>>>>>> master. It is fixed in 1.0.2. The second one is partially fixed in > >>>>>>> 1.0.2, > >>>>>>>> but is not worth blocking the release for. > >>>>>>>> > >>>>>>>> > >>>>>>>> On Fri, Jul 25, 2014 at 4:23 PM, Nicholas Chammas < > >>>>>>>> nicholas.cham...@gmail.com> wrote: > >>>>>>>> > >>>>>>>>> TD, there are a couple of unresolved issues slated for 1.0.2 > >>>>>>>>> < > >>>>>>>>> > >>>>>>> > >>>>> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%201.0.2%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC > >>>>>>>>>> . > >>>>>>>>> Should they be edited somehow? > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> On Fri, Jul 25, 2014 at 7:08 PM, Tathagata Das < > >>>>>>>>> tathagata.das1...@gmail.com> > >>>>>>>>> wrote: > >>>>>>>>> > >>>>>>>>>> Please vote on releasing the following candidate as Apache Spark > >>>>>>> version > >>>>>>>>>> 1.0.2. > >>>>>>>>>> > >>>>>>>>>> This release fixes a number of bugs in Spark 1.0.1. > >>>>>>>>>> Some of the notable ones are > >>>>>>>>>> - SPARK-2452: Known issue is Spark 1.0.1 caused by attempted fix > >>>>> for > >>>>>>>>>> SPARK-1199. The fix was reverted for 1.0.2. > >>>>>>>>>> - SPARK-2576: NoClassDefFoundError when executing Spark QL query on > >>>>>>>>>> HDFS CSV file. > >>>>>>>>>> The full list is at http://s.apache.org/9NJ > >>>>>>>>>> > >>>>>>>>>> The tag to be voted on is v1.0.2-rc1 (commit 8fb6f00e): > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>>> > >>>>> https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=8fb6f00e195fb258f3f70f04756e07c259a2351f > >>>>>>>>>> > >>>>>>>>>> The release files, including signatures, digests, etc can be found > >>>>> at: > >>>>>>>>>> http://people.apache.org/~tdas/spark-1.0.2-rc1/ > >>>>>>>>>> > >>>>>>>>>> Release artifacts are signed with the following key: > >>>>>>>>>> https://people.apache.org/keys/committer/tdas.asc > >>>>>>>>>> > >>>>>>>>>> The staging repository for this release can be found at: > >>>>>>>>>> > >>>>>>> https://repository.apache.org/content/repositories/orgapachespark-1024/ > >>>>>>>>>> > >>>>>>>>>> The documentation corresponding to this release can be found at: > >>>>>>>>>> http://people.apache.org/~tdas/spark-1.0.2-rc1-docs/ > >>>>>>>>>> > >>>>>>>>>> Please vote on releasing this package as Apache Spark 1.0.2! > >>>>>>>>>> > >>>>>>>>>> The vote is open until Tuesday, July 29, at 23:00 UTC and passes if > >>>>>>>>>> a majority of at least 3 +1 PMC votes are cast. > >>>>>>>>>> [ ] +1 Release this package as Apache Spark 1.0.2 > >>>>>>>>>> [ ] -1 Do not release this package because ... > >>>>>>>>>> > >>>>>>>>>> To learn more about Apache Spark, please see > >>>>>>>>>> http://spark.apache.org/ > >>>>>>>>>> > >>>>>>>>> > >>>>>>> > >>>>> > >> >