On Fri, Jun 12, 2015 at 5:12 PM, Patrick Wendell <pwend...@gmail.com> wrote: > I would like to understand though Sean - what is the proposal exactly? > Hadoop 2 itself supports all of the Hadoop 1 API's, so things like > removing the Hadoop 1 variant of sc.hadoopFile, etc, I don't think
Not entirely; you can see some binary incompatibilities that have bitten recently. A Hadoop 1 program does not in general work on Hadoop 2 because of this. Part of my thinking is that I'm not clear Hadoop 1.x, and 2.0.x, fully works anymore anyway. See for example SPARK-8057 recently. I recall similar problems with Hadoop 2.0.x-era releases and the Spark build for that which is basically the 'cdh4' build. So one benefit is skipping whatever work would be needed to continue to fix this up, and, the argument is there may be less loss of functionality than it seems. The other is being able to use later APIs. This much is a little minor. > The main reason I'd push back is that I do think there are still > people running the older versions. For instance at Databricks we use > the FileSystem library for talking to S3... every time we've tried to > upgrade to Hadoop 2.X there have been significant regressions in > performance and we've had to downgrade. That's purely anecdotal, but I > think you have people out there using the Hadoop 1 bindings for whom > upgrade would be a pain. Yeah, that's the question. Is anyone out there using 1.x? More anecdotes wanted. That might be the most interesting question. No CDH customers would have been for a long while now, for example. (Still a small number of CDH 4 customers out there though, and that's 2.0.x or so, but that's a gray area.) Is the S3 library thing really related to Hadoop 1.x? that comes from jets3t and that's independent. > In terms of our maintenance cost, to me the much bigger cost for us > IMO is dealing with differences between e.g. 2.2, 2.4, and 2.6 where > major new API's were added. In comparison the Hadoop 1 vs 2 seems Really? I'd say the opposite. No APIs that are only in 2.2, let alone only in a later version, can be in use now, right? 1.x wouldn't work at all then. I don't know of any binary incompatibilities of the type between 1.x and 2.x, which we have had to shim to make work. In both cases dependencies have to be harmonized here and there, yes. That won't change. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org