There's a lot of stuff in 2.8; I note that I'd like to see the s3a perf improvements & openstack fixes in there: for which I need reviewers. I don't have the spare time to do this myself.
I've already been building & testing both Apache Slider (incubating) and Apache Spark against both 2.8.0-SNAPSHOT & 3.0.0-SNAPSHOT. What's been troublesome for builds which use maven as the way of managing dependencies (I'm ignoring the fact that spark *also* has an SBT build with ivy doing dep management)? HDFS client -hadoop-hdfs-client pulled HdfsConfiguration. I'd been explicitly creating this to force in hdfs-default.xml & hdfs-site.xml loading, so that I could do sanity checks on things like security settings prior to attempting AM launch. -likewise, DFSConfigKeys stayed in hdfs-server. I know it's tagged as @Private, but it's long been where all the string constants for HDFS options live. Forcing users to retype them in their own source is not only dangerous (it only encourages typos), it actually stops you using your IDE finding out where those constants get used. We do now have a set of keys in the client, HdfsClientConfigKeys, but these are still declared as @Private. Which is a mistake for the reasons above, and because it encourages hadoop developers to assume that they are free to make whatever changes they want to this code, and if it breaks something, say "it was tagged as private" 1. We have to recognise that a lot of things marked @Private are in fact essential for clients to use. Not just constants, but actual classes. 2. We have to look hard at @LimitedPrivate and question the legitimacy of tagging things as so, especially anything "@InterfaceAudience.LimitedPrivate({""MapReduce"}) —because any YARN app you write ends up needing those classes. For evidence, look at DistributedShell's imports, and pick a few at random: NMClientAsyncImpl, ConverterUtils being easy targets. 3. Or for real fun, UGI: @InterfaceAudience.LimitedPrivate({"HDFS", "MapReduce", "HBase", "Hive", "Oozie"}) I'd advocate marking all "MapReduce" as "YarnApp" and have people working on those classes accept that they will be used downstream and treat changes with caution. Yes, they may be messy, but how things are used. At least with a modern IDE you can add in the downstream projects and identify those uses with ease. In the end SLIDER-948 addressed the problems for me. I switched to pulling in hadoop-hdfs *and* copied and pasted all the DFSConfigurationKeys I used into my own file of constants. HDFS-9301 should make these changes things I could revert —and other projects not notice them ever existing —but I've left them them in to isolate me from any more situations like this. To be completely ruthless: I don't trust that code to not break my builds any more. Behaviour-wise, I've not seen much in the way of changes; all tests work the same. Oh and Spark wouldn't compile against 3.0 as an exception tagged as @Deprecated since Hadoop 0.18 got pulled. Trivially fixed. Returning to the pending 2.8.0 release, there's a way to find out what's going to break: build and test things against the snapshots, without waiting for the beta releases and expecting the downstream projects to do it for you. If they don't build, that's a success: you've found a compatibility problem to fix. If they don't test, well that's trouble —you are in finger pointing time. -Steve > On 11 Nov 2015, at 23:26, Haohui Mai <ricet...@gmail.com> wrote: > > bq. If and only if they take the Hadoop class path at face value. > Many applications don’t because of conflicting dependencies and > instead import specific jars. > > We do make the assumptions that applications need to pick up all the > dependency (either automatically or manually). The situation is > similar with adding a new dependency into hdfs in a minor release. > > Maven / gradle obviously help, but I'd love to hear more about it how > you get it to work. In trunk hadoop-env.sh adds 118 jars into the > class path. Are you manually importing 118 jars for every single > applications? > > > > On Wed, Nov 11, 2015 at 3:09 PM, Haohui Mai <ricet...@gmail.com> wrote: >> bq. currently pulling in hadoop-client gives downstream apps >> hadoop-hdfs-client, but not hadoop-hdfs server side, right? >> >> Right now hadoop-client pulls in hadoop-hdfs directly to ensure a >> smooth transition. Maybe we can revisit the decision in the 2.9 / 3.x? >>