Re: [DISCUSS] Shade guava into hadoop-thirdparty

Wei-Chiu Chuang Sat, 04 Apr 2020 15:39:28 -0700

Great question!

I can run Java API Compliance Checker to detect any API changes. Guess
that's the only one to find out.


On Sat, Apr 4, 2020 at 1:19 PM Igor Dvorzhak <i...@google.com.invalid> wrote:

> How this proposal will impact public APIs? I.e does Hadoop expose any
> Guava classes in the client APIs that will require recompiling all client
> applications because they need to use shaded Guava classes?
>
> On Sat, Apr 4, 2020 at 12:13 PM Wei-Chiu Chuang <weic...@apache.org>
> wrote:
>
>> Hi Hadoop devs,
>>
>> I spent a good part of the past 7 months working with a dozen of
>> colleagues
>> to update the guava version in Cloudera's software (that includes Hadoop,
>> HBase, Spark, Hive, Cloudera Manager ... more than 20+ projects)
>>
>> After 7 months, I finally came to a conclusion: Update to Hadoop 3.3 /
>> 3.2.1 / 3.1.3, even if you just go from Hadoop 3.0/ 3.1.0 is going to be
>> really hard because of guava. Because of Guava, the amount of work to
>> certify a minor release update is almost equivalent to a major release
>> update.
>>
>> That is because:
>> (1) Going from guava 11 to guava 27 is a big jump. There are several
>> incompatible API changes in many places. Too bad the Google developers are
>> not sympathetic about its users.
>> (2) guava is used in all Hadoop jars. Not just Hadoop servers but also
>> client jars and Hadoop common libs.
>> (3) The Hadoop library is used in practically all software at Cloudera.
>>
>> Here is my proposal:
>> (1) shade guava into hadoop-thirdparty, relocate the classpath to
>> org.hadoop.thirdparty.com.google.common.*
>> (2) make a hadoop-thirdparty 1.1.0 release.
>> (3) update existing references to guava to the relocated path. There are
>> more than 2k imports that need an update.
>> (4) release Hadoop 3.3.1 / 3.2.2 that contains this change.
>>
>> In this way, we will be able to update guava in Hadoop in the future
>> without disrupting Hadoop applications.
>>
>> Note: HBase already did this and this guava update project would have been
>> much more difficult if HBase didn't do so.
>>
>> Thoughts? Other options include
>> (1) force downstream applications to migrate to Hadoop client artifacts as
>> listed here
>>
>> https://hadoop.apache.org/docs/r3.1.1/hadoop-project-dist/hadoop-common/DownstreamDev.html
>> but
>> that's nearly impossible.
>> (2) Migrate Guava to Java APIs. I suppose this is a big project and I
>> can't
>> estimate how much work it's going to be.
>>
>> Weichiu
>>
>

Re: [DISCUSS] Shade guava into hadoop-thirdparty

Reply via email to