Re: [E] [DISCUSS] Hadoop 3, dropping support for Hadoop 2.x

frank chen Tue, 08 Jun 2021 19:44:17 -0700

Considering Druid takes advantage of lots of external components to work, I
think we should upgrade Druid in a little bit conservitive way. Dropping
support of hadoop2 is not a good idea.
The upgrading of the ZooKeeper client in Druid also prevents me from
adopting 0.22 for a longer time.


Although users could upgrade these dependencies first to use the latest
Druid releases, frankly speaking, these upgrades are not so easy in
production and usually take longer time, which would prevent users from
experiencing new features of Druid.
For hadoop3, I have heard of some performance issues, which also makes me
have no confidence to upgrade.

I think what Jihoon proposes is a good idea, separating hadoop2 from Druid
core as an extension.
Since hadoop2 has not been EOF, to achieve balance between compatibility
and long term evolution, maybe we could provide two extensions, one for
hadoop2, one for hadoop3.



Will Lauer <wla...@verizonmedia.com.invalid> 于2021年6月9日周三 上午4:13写道：

> Just to follow up on this, our main problem with hadoop3 right now has been
> instability in HDFS, to the extent that we put on hold any plans to deploy
> it to our production systems. I would claim Hadoop3 isn't mature enough yet
> to consider migrating Druid to it.
>
> WIll
>
> <http://www.verizonmedia.com>
>
> Will Lauer
>
> Senior Principal Architect, Audience & Advertising Reporting
> Data Platforms & Systems Engineering
>
> M 508 561 6427
> 1908 S. First St
> Champaign, IL 61822
>
> <http://www.facebook.com/verizonmedia>   <http://twitter.com/verizonmedia>
> <https://www.linkedin.com/company/verizon-media/>
> <http://www.instagram.com/verizonmedia>
>
>
>
> On Tue, Jun 8, 2021 at 2:59 PM Will Lauer <wla...@verizonmedia.com> wrote:
>
> > Unfortunately, the migration off of hadoop3 is a hard one (maybe not for
> > Druid, but certainly for big organizations running large hadoop2
> > workloads). If druid migrated to hadoop3 after 0.22, that would probably
> > prevent me from taking any new versions of Druid for at least the
> remainder
> > of the year and possibly longer.
> >
> > Will
> >
> >
> > <http://www.verizonmedia.com>
> >
> > Will Lauer
> >
> > Senior Principal Architect, Audience & Advertising Reporting
> > Data Platforms & Systems Engineering
> >
> > M 508 561 6427
> > 1908 S. First St
> > Champaign, IL 61822
> >
> > <http://www.facebook.com/verizonmedia>   <
> http://twitter.com/verizonmedia>
> >    <https://www.linkedin.com/company/verizon-media/>
> > <http://www.instagram.com/verizonmedia>
> >
> >
> >
> > On Tue, Jun 8, 2021 at 3:08 AM Clint Wylie <cwy...@apache.org> wrote:
> >
> >> Hi all,
> >>
> >> I've been assisting with some experiments to see how we might want to
> >> migrate Druid to support Hadoop 3.x, and more importantly, see if maybe
> we
> >> can finally be free of some of the dependency issues it has been causing
> >> for as long as I can remember working with Druid.
> >>
> >> Hadoop 3 introduced shaded client jars,
> >>
> >>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_HADOOP-2D11804&d=DwIBaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=ULseRJUsY5gTBgFA9-BUxg&m=FRw8adGvb_qAPLtFgQWNJywJiOgU8zgfkkXf_nokPKQ&s=rBnEOMf2IKDMeWUo4TZyqf5CzrnbiYTfZUkjHr8GOHo&e=
> >> , with the purpose to
> >> allow applications to talk to the Hadoop cluster without drowning in its
> >> transitive dependencies. The experimental branch that I have been
> helping
> >> with, which is using these new shaded client jars, can be seen in this
> PR
> >>
> >>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_druid_pull_11314&d=DwIBaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=ULseRJUsY5gTBgFA9-BUxg&m=FRw8adGvb_qAPLtFgQWNJywJiOgU8zgfkkXf_nokPKQ&s=424doHggbejAz5XswosgVkJK98VUBcUj0pD5bAcBjT0&e=
> >> , and is currently working with
> >> the HDFS integration tests as well as the Hadoop tutorial flow in the
> >> Druid
> >> docs (which is pretty much equivalent to the HDFS integration test).
> >>
> >> The cloud deep storages still need some further testing and some minor
> >> cleanup still needs done for the docs and such. Additionally we still
> need
> >> to figure out how to handle the Kerberos extension, because it extends
> >> some
> >> Hadoop classes so isn't able to use the shaded client jars in a
> >> straight-forward manner, and so still has heavy dependencies and hasn't
> >> been tested. However, the experiment has started to pan out enough to
> >> where
> >> I think it is worth starting this discussion, because it does have some
> >> implications.
> >>
> >> Making this change I think will allow us to update our dependencies
> with a
> >> lot more freedom (I'm looking at you, Guava), but the catch is that once
> >> we
> >> make this change and start updating these dependencies, it will become
> >> hard, nearing impossible to support Hadoop 2.x, since as far as I know
> >> there isn't an equivalent set of shaded client jars. I am also not
> certain
> >> how far back the Hadoop job classpath isolation stuff goes
> >> (mapreduce.job.classloader = true) which I think is required to be set
> on
> >> Druid tasks for this shaded stuff to work alongside updated Druid
> >> dependencies.
> >>
> >> Is anyone opposed to or worried about dropping Hadoop 2.x support after
> >> the
> >> Druid 0.22 release?
> >>
> >
>

Re: [E] [DISCUSS] Hadoop 3, dropping support for Hadoop 2.x

Reply via email to