Re: [DISCUSS] Hadoop 3, dropping support for Hadoop 2.x

Rajiv Mordani Tue, 08 Jun 2021 09:52:55 -0700

Also how about officially supporting minio? I know that support for s3 exists 
but it will be good to officially support minio as well as the deep storage.

  *   Rajiv

From: Clint Wylie <cwy...@apache.org>
Date: Tuesday, June 8, 2021 at 1:08 AM
To: dev@druid.apache.org <dev@druid.apache.org>
Subject: [DISCUSS] Hadoop 3, dropping support for Hadoop 2.x
Hi all,

I've been assisting with some experiments to see how we might want to
migrate Druid to support Hadoop 3.x, and more importantly, see if maybe we
can finally be free of some of the dependency issues it has been causing
for as long as I can remember working with Druid.

Hadoop 3 introduced shaded client jars,
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FHADOOP-11804&amp;data=04%7C01%7Crmordani%40vmware.com%7Cfd9387a4854f48588f4408d92a549684%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637587365059243826%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=ZH1VZGMf5a%2F7VT2R06kTBUUcJWNqENM%2Bnk%2Ba8OvAYJc%3D&amp;reserved=0,
 with the purpose to
allow applications to talk to the Hadoop cluster without drowning in its
transitive dependencies. The experimental branch that I have been helping
with, which is using these new shaded client jars, can be seen in this PR
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fdruid%2Fpull%2F11314&amp;data=04%7C01%7Crmordani%40vmware.com%7Cfd9387a4854f48588f4408d92a549684%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637587365059243826%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=R5KYZcePTmLucDf3WIDZ%2FUsxjVeh%2Bs9TkCjoXV3vMHo%3D&amp;reserved=0,
 and is currently working with
the HDFS integration tests as well as the Hadoop tutorial flow in the Druid
docs (which is pretty much equivalent to the HDFS integration test).

The cloud deep storages still need some further testing and some minor
cleanup still needs done for the docs and such. Additionally we still need
to figure out how to handle the Kerberos extension, because it extends some
Hadoop classes so isn't able to use the shaded client jars in a
straight-forward manner, and so still has heavy dependencies and hasn't
been tested. However, the experiment has started to pan out enough to where
I think it is worth starting this discussion, because it does have some
implications.

Making this change I think will allow us to update our dependencies with a
lot more freedom (I'm looking at you, Guava), but the catch is that once we
make this change and start updating these dependencies, it will become
hard, nearing impossible to support Hadoop 2.x, since as far as I know
there isn't an equivalent set of shaded client jars. I am also not certain
how far back the Hadoop job classpath isolation stuff goes
(mapreduce.job.classloader = true) which I think is required to be set on
Druid tasks for this shaded stuff to work alongside updated Druid
dependencies.

Is anyone opposed to or worried about dropping Hadoop 2.x support after the
Druid 0.22 release?

Re: [DISCUSS] Hadoop 3, dropping support for Hadoop 2.x

Reply via email to