Re: [DISCUSS] Accelerate Hadoop dependency updates

2020-03-12 Thread Wei-Chiu Chuang
That is unfortunately true. Now that I recognize the impact of guava update in Hadoop 3.1/3.2, how can we make this better for downstreamers to consume? Like I proposed, I think a middle ground is to shade guava in hadoop-thirdparty, and include the hadoop-thirdparty jar in the next Hadoop 3.1/3.2

Re: [DISCUSS] Accelerate Hadoop dependency updates

2020-03-12 Thread Igor Dvorzhak
How do you manage and version such dependency upgrades in subminor Haoop/Spark/Hive versions in Cloudera then? I would imagine that some upgrades will be breaking for customers and can not be shipped in subminor CDH release? Or this is in preparation for the next major/minor release of CDH? On Wed

Re: [DISCUSS] Accelerate Hadoop dependency updates

2020-03-11 Thread Wei-Chiu Chuang
FWIW we are updating guava in Spark and Hive at Cloudera. Don't know which Apache version are they going to land, but we'll upstream them for sure. The guava change is debatable. It's not as critical as others. There are critical vulnerabilities in other dependencies that we have no way but to upd

Re: [DISCUSS] Accelerate Hadoop dependency updates

2020-03-11 Thread Igor Dvorzhak
Generally I'm for updating dependencies, but I think that Hadoop should stick with semantic versioning and do not make major and minor dependency updates in subminor releases. For example, Hadoop 3.2.1 updated Guava to 27.0-jre, and because of this Spark 3.0 stuck with Hadoop 3.2.0 - they use Hiv

Re: [DISCUSS] Accelerate Hadoop dependency updates

2020-03-10 Thread Wei-Chiu Chuang
I'm not hearing any feedback so far, but I want to suggest: use hadoop-thirdparty repository to host any dependencies that are known to break compatibility. Candidate #1 guava Candidate #2 Netty Candidate #3 Jetty in fact, HBase shades these dependencies for the exact same reason. As an example

[DISCUSS] Accelerate Hadoop dependency updates

2020-03-07 Thread Wei-Chiu Chuang
Hi Hadoop devs, I the past, Hadoop tends to be pretty far behind the latest versions of dependencies. Part of that is due to the fear of the breaking changes brought in by the dependency updates. However, things have changed dramatically over the past few years. With more focus on security vulner