On Fri, Aug 29, 2014 at 7:42 AM, Patrick Wendell <pwend...@gmail.com> wrote: > In terms of vendor support for this approach - In the early days > Cloudera asked us to add CDH4 repository and more recently Pivotal and > MapR also asked us to allow linking against their hadoop-client > libraries. So we've added these based on direct requests from vendors. > Given the ubiquity of the Hadoop FileSystem API, it's hard for me to > imagine ruffling feathers by supporting this. But if we get feedback > in that direction over time we can of course consider a different > approach.
By this, you mean that it's easy to control the Hadoop version in the build and set it to some other vendor-specific release? Yes that seems ideal. Making the build flexible, and adding the repository references to pom.xml is part of enabling that -- to me, no question that's good. So you can always roll your own build for your cluster, if you need to. I understand the role of the cdh4 / mapr3 / mapr4 binaries as just a convenience. But it's a convenience for people who... - are installing Spark on a cluster (i.e. not an end user) - that doesn't have it in their distro already - whose distro isn't compatible with a plain vanilla Hadoop distro That can't be many. CDH4.6+ is most of the installed CDH base and it already has Spark. I thought MapR already had Spark built in. The audience seems small enough, and the convenience relatively small enough (is it hard to run the distribution script?) that it caused me to ask whether it was worth bothering providing these, especially give the possible ASF sensitivity. I say crack on; you get my point. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org