That is unfortunately true.
Now that I recognize the impact of guava update in Hadoop 3.1/3.2, how can
we make this better for downstreamers to consume? Like I proposed, I think
a middle ground is to shade guava in hadoop-thirdparty, and include the
hadoop-thirdparty jar in the next Hadoop 3.1/3.2
How do you manage and version such dependency upgrades in subminor
Haoop/Spark/Hive versions in Cloudera then? I would imagine that some
upgrades will be breaking for customers and can not be shipped in subminor
CDH release? Or this is in preparation for the next major/minor release of
CDH?
On Wed
FWIW we are updating guava in Spark and Hive at Cloudera. Don't know which
Apache version are they going to land, but we'll upstream them for sure.
The guava change is debatable. It's not as critical as others. There are
critical vulnerabilities in other dependencies that we have no way but to
upd
Generally I'm for updating dependencies, but I think that Hadoop should
stick with semantic versioning and do not make major and
minor dependency updates in subminor releases.
For example, Hadoop 3.2.1 updated Guava to 27.0-jre, and because of this
Spark 3.0 stuck with Hadoop 3.2.0 - they use Hiv
I'm not hearing any feedback so far, but I want to suggest:
use hadoop-thirdparty repository to host any dependencies that are known to
break compatibility.
Candidate #1 guava
Candidate #2 Netty
Candidate #3 Jetty
in fact, HBase shades these dependencies for the exact same reason.
As an example
Hi Hadoop devs,
I the past, Hadoop tends to be pretty far behind the latest versions of
dependencies. Part of that is due to the fear of the breaking changes
brought in by the dependency updates.
However, things have changed dramatically over the past few years. With
more focus on security vulner