Ah. I misunderstood that Matei was referring to the Scala 2.11 tarball at http://people.apache.org/~pwendell/spark-1.3.0-rc3/ and not the Maven artifacts.
Patrick I see you just commented on SPARK-5134 and will follow up there. Sounds like this may accidentally not be a problem. On binary tarball releases, I wonder if anyone has an opinion on my opinion that these shouldn't be distributed for specific Hadoop *distributions* to begin with. (Won't repeat the argument here yet.) That resolves this n x m explosion too. Vendors already provide their own distribution, yes, that's their job. On Sun, Mar 8, 2015 at 9:42 PM, Krishna Sankar <ksanka...@gmail.com> wrote: > Yep, otherwise this will become an N^2 problem - Scala versions X Hadoop > Distributions X ... > > May be one option is to have a minimum basic set (which I know is what we > are discussing) and move the rest to spark-packages.org. There the vendors > can add the latest downloads - for example when 1.4 is released, HDP can > build a release of HDP Spark 1.4 bundle. > > Cheers > <k/> > > On Sun, Mar 8, 2015 at 2:11 PM, Patrick Wendell <pwend...@gmail.com> wrote: >> >> We probably want to revisit the way we do binaries in general for >> 1.4+. IMO, something worth forking a separate thread for. >> >> I've been hesitating to add new binaries because people >> (understandably) complain if you ever stop packaging older ones, but >> on the other hand the ASF has complained that we have too many >> binaries already and that we need to pare it down because of the large >> volume of files. Doubling the number of binaries we produce for Scala >> 2.11 seemed like it would be too much. >> >> One solution potentially is to actually package "Hadoop provided" >> binaries and encourage users to use these by simply setting >> HADOOP_HOME, or have instructions for specific distros. I've heard >> that our existing packages don't work well on HDP for instance, since >> there are some configuration quirks that differ from the upstream >> Hadoop. >> >> If we cut down on the cross building for Hadoop versions, then it is >> more tenable to cross build for Scala versions without exploding the >> number of binaries. >> >> - Patrick >> >> On Sun, Mar 8, 2015 at 12:46 PM, Sean Owen <so...@cloudera.com> wrote: >> > Yeah, interesting question of what is the better default for the >> > single set of artifacts published to Maven. I think there's an >> > argument for Hadoop 2 and perhaps Hive for the 2.10 build too. Pros >> > and cons discussed more at >> > >> > https://issues.apache.org/jira/browse/SPARK-5134 >> > https://github.com/apache/spark/pull/3917 >> > >> > On Sun, Mar 8, 2015 at 7:42 PM, Matei Zaharia <matei.zaha...@gmail.com> >> > wrote: >> >> +1 >> >> >> >> Tested it on Mac OS X. >> >> >> >> One small issue I noticed is that the Scala 2.11 build is using Hadoop >> >> 1 without Hive, which is kind of weird because people will more likely >> >> want >> >> Hadoop 2 with Hive. So it would be good to publish a build for that >> >> configuration instead. We can do it if we do a new RC, or it might be that >> >> binary builds may not need to be voted on (I forgot the details there). >> >> >> >> Matei >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >> For additional commands, e-mail: dev-h...@spark.apache.org >> > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org