Re: [DISCUSS] Hadoop ingestion support

Abhishek Agarwal Wed, 02 Jul 2025 04:10:12 -0700

An alternate approach is to remove Hadoop in 35 entirely but allow
backports to 34 release branch. Any bugs with reasonable severity can be
backported to the 34 branch. When we make a release, we do a major release
and a patch release for 34. I suggest we do this till Druid 36, and then
discontinue the 34 release line. From Druid 37 onwards, we have one release
line only.


On Wed, Jul 2, 2025 at 8:42 AM Karan Kumar <ka...@apache.org> wrote:

> I have not heard any plans about deprecating `*druid-hdfs-storage`* which
> serves as a deep storage implementation. This thread is strictly
> about hadoop *ingestion* support.
>
> On Wed, Jul 2, 2025 at 8:28 AM Eyal Yurman <eyal.yur...@gmail.com> wrote:
>
> > Druid also includes druid-hdfs-storage core extension which bundles
> > hadoop-client-api.
> >
> > I assume there isn't a plan to deprecate this extension?
> >
> > On Tue, Jul 1, 2025 at 8:19 AM Gian Merlino <g...@apache.org> wrote:
> >
> > > We are in a tough situation where our hand is being forced on dropping
> > > Java 11 support by the Jetty 9 EOL situation. It isn't a good idea to
> > > continue using Jetty 9 given it's no longer receiving security updates,
> > and
> > > Jetty 12 (the only currently-supported version) requires Java 17.
> > >
> > > However, the main thing pushing us to drop Hadoop-based ingestion
> support
> > > is the fact that Hadoop doesn't support Java 17. If we can find a way
> to
> > > keep it working even with the Jetty 12 + Java 17 upgrade, then we don't
> > > necessarily need to drop the support immediately. To that end I suggest
> > the
> > > following approach:
> > >
> > > 1) In Druid 34 (next upcoming release, code freeze in a week or so)
> > > announce that Java 11 support and Hadoop-based ingestion are both
> > > deprecated. For Hadoop-based ingestion, provide guidance on migrating
> to
> > > SQL-based ingestion (as a Map/Reduce replacement) and the k8s task
> runner
> > > (as a YARN replacement).
> > >
> > > 2) In Druid 35 (~October) bump up the minimum Java version to Java 17,
> > and
> > > upgrade to Jetty 12. Try to keep Hadoop-based ingestion working by
> > > continuing to target Java 11 when we compile our own code, and avoiding
> > > usage of Java-17-requiring libraries on the ingestion code path. (I
> > believe
> > > Jetty is not used on the ingestion code path.) We may be able to
> continue
> > > supporting Hadoop-based ingestion in this way. If we can- great. If we
> > > can't- we would need to remove support for Hadoop-based ingestion in
> this
> > > version.
> > >
> > > If we manage to get Druid 35 working with Hadoop-based ingestion, that
> > > situation could continue for some time. At some point, something else
> may
> > > force our hand- perhaps a critical library on the ingestion path will
> > begin
> > > to require Java 17. We would see how it plays out.
> > >
> > > Thoughts?
> > >
> > > Gian
> > >
> > > On 2025/06/23 20:14:39 Lucas Capistrant wrote:
> > > > Thanks for your input from Roku user point of view, Krishna. We are
> > > > definitely in a tough spot here because of Hadoop support preventing
> us
> > > > from dropping Java 11 support. And then the domino effect being we
> > can’t
> > > > upgrade off of EOL dependencies such as Jetty 9.
> > > >
> > > > In the Java 11 support discussion,
> > > > https://lists.apache.org/thread/bvkztwoyy35mvyqkccp87zrfd68sqqkw, we
> > > > discuss the risk of supporting Java 11 beyond Druid 34. I think the
> > > biggest
> > > > worry is that we are going to get caught in a situation where a patch
> > fix
> > > > for a CVE could require dropping Java 11 and Hadoop support in a
> patch
> > > > release because resolving the CVE requires dependency upgrades that
> > don’t
> > > > support 11. Delaying dropping support until Druid 36 makes it all the
> > > more
> > > > likely that we run into that situation.
> > > >
> > > > If we were to drop Hadoop ingest support in October as a part of
> Druid
> > > 35,
> > > > would there be a clear path forward for your Druid deployments?
> > Assuming
> > > > the community provides a solid migration plan for open source users
> > > > regarding Hadoop ingestion alternatives.
> > > >
> > > > Also, if there is a path to supporting Hadoop ingestion as a contrib
> > > > extension and someone in the community wanted to carry the torch on
> its
> > > > development, that is definitely a possibility as well. Though, I’m
> not
> > > sure
> > > > that anyone has scoped out how much work that would be, or if it’s
> even
> > > > possible to achieve.
> > > >
> > > > Thanks,
> > > > Lucas
> > > >
> > > > On Wed, Jun 18, 2025 at 6:50 PM Krishna Thirumalasetty <
> > > kthir...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi everyone,
> > > > >
> > > > > Adding to the voices from Netflix and Target — at Roku Inc., we
> also
> > > rely
> > > > > heavily on Hadoop-based batch ingestion for a significant portion
> of
> > > our
> > > > > Druid datasources. This approach allows us to leverage our existing
> > > Hadoop
> > > > > infrastructure efficiently and cost-effectively for large-scale
> batch
> > > > > processing.
> > > > >
> > > > > If the community decides to move forward with the removal of Hadoop
> > > > > ingestion support, it would likely force us to remain on an older
> > > version
> > > > > of Druid for some time. This is not ideal, as it would prevent us
> > from
> > > > > benefiting from ongoing improvements, security updates, and newer
> > > features
> > > > > in the Druid ecosystem.
> > > > >
> > > > > That said, we fully understand and support the broader goals of
> > > modernizing
> > > > > the Druid platform, reducing tech debt, and enabling the use of
> more
> > > > > current Java features and dependency upgrades. Given these
> competing
> > > > > priorities, we believe the best path forward would be:
> > > > >
> > > > >    - *Clear deprecation communication* in Druid 32, discouraging
> new
> > > > >    adoption while giving teams time to react.
> > > > >    - *An official target removal date*, such as in Druid 36 (early
> > > 2026),
> > > > >    which provides adequate lead time for organizations like ours to
> > > > > evaluate
> > > > >    alternatives and begin planning migrations.
> > > > >    - *Consideration of keeping the Hadoop ingestion module as a
> > contrib
> > > > >    extension*, or at least providing a supported migration path
> with
> > > > >    documentation to MM-less ingestion or other batch ingestion
> > > > > alternatives.
> > > > >
> > > > > This approach would help companies like Roku manage the transition
> > in a
> > > > > predictable and structured way, while also empowering the Druid
> > > community
> > > > > to move forward with more agility.
> > > > >
> > > > > Thanks for raising this important discussion.
> > > > >
> > > > > Best,
> > > > > Krishna Thirumalasetty
> > > > > Roku Inc.
> > > > >
> > > > > On Tue, Jun 17, 2025 at 3:28 PM Eyal Yurman <eyal.yur...@gmail.com
> >
> > > wrote:
> > > > >
> > > > > > Sharing as another data point -
> > > > > >
> > > > > > We still use YARN to run Hadoop-based batch ingestion. Very
> useful
> > > > > > on-premise for resource sharing, where autoscaling isn't always
> an
> > > > > option.
> > > > > > But we plan to move to Kubernetes for ingestion sometime next
> year.
> > > > > >
> > > > > >
> > > > > > On Tue, Jun 17, 2025 at 12:20 PM Gian Merlino <g...@apache.org>
> > > wrote:
> > > > > >
> > > > > > > I'm on board with this. I also think we should deprecate it
> ASAP,
> > > > > > starting
> > > > > > > in the next major release. It'd be nice to also build a
> migration
> > > guide
> > > > > > > that helps people move from Hadoop ingestion to SQL/MSQ
> > ingestion,
> > > and
> > > > > > from
> > > > > > > YARN to K8S pod runners.
> > > > > > >
> > > > > > > Gian
> > > > > > >
> > > > > > > On 2025/06/09 20:10:03 Clint Wylie wrote:
> > > > > > > > Following up on this, I want to propose the first release of
> > > 2026 for
> > > > > > > > removal, which I think would be Druid 36, to give some lead
> > time
> > > for
> > > > > > > > those affected to prepare.
> > > > > > > >
> > > > > > > > On Wed, Apr 9, 2025 at 8:42 AM Frank Chen <
> > frankc...@apache.org>
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > We don't use Hadoop ingestion, it's OK for us to drop the
> > > support
> > > > > of
> > > > > > > Hadoop.
> > > > > > > > >
> > > > > > > > > We can make an announcement to deprecate it first(from
> 33?),
> > > remove
> > > > > > it
> > > > > > > from
> > > > > > > > > official distribution( but keep the ability to build it as
> > > above
> > > > > > > suggested,
> > > > > > > > > from 34?),
> > > > > > > > > and remove it completely at a proper time.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Wed, Apr 9, 2025 at 5:02 AM Maytas Monsereenusorn <
> > > > > > > mayt...@apache.org>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > I'm in favor of removing too but we should not rush the
> > > removal
> > > > > and
> > > > > > > make
> > > > > > > > > > sure we give enough time for users to migrate to other
> > types
> > > of
> > > > > > > ingestion.
> > > > > > > > > > Similar to what Lucas said, if Hadoop is holding back
> Druid
> > > then
> > > > > we
> > > > > > > should
> > > > > > > > > > remove it. Druid also supports many other types of
> > ingestion
> > > > > > > compared to
> > > > > > > > > > back when Hadoop ingestion was added.
> > > > > > > > > > For Netflix, we will be migrating to MM-less Druid
> > ingestion
> > > in
> > > > > > K8s.
> > > > > > > I
> > > > > > > > > > think MM-less Druid ingestion in K8s is probably the
> > closest
> > > to
> > > > > > > Hadoop
> > > > > > > > > > ingestion as we do not have to maintain a dedicated Druid
> > > > > specific
> > > > > > MM
> > > > > > > > > > cluster (works well for companies with existing
> > large/shared
> > > > > > Compute
> > > > > > > > > > clusters). Personally, I feel we should focus our energy
> on
> > > > > things
> > > > > > > > > > like MM-less Druid in K8s (which is still marked as
> > > Experimental)
> > > > > > > rather
> > > > > > > > > > than Hadoop.
> > > > > > > > > >
> > > > > > > > > > Best Regards,
> > > > > > > > > > Maytas
> > > > > > > > > >
> > > > > > > > > > On Tue, Apr 8, 2025 at 4:06 AM Lucas Capistrant <
> > > > > > > > > > capistrant.lu...@gmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Yes, I’m in favor of removing it from the core release
> > and
> > > also
> > > > > > in
> > > > > > > favor
> > > > > > > > > > of
> > > > > > > > > > > officially announcing deprecation with a timeline for
> > > removal,
> > > > > if
> > > > > > > we have
> > > > > > > > > > > not yet. It stinks to lose the Hadoop ingest support,
> but
> > > if
> > > > > that
> > > > > > > project
> > > > > > > > > > > is going to hold back Druid, it seems we don’t have
> much
> > > > > choice.
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > > Lucas
> > > > > > > > > > >
> > > > > > > > > > > On Tue, Apr 8, 2025 at 4:27 AM Karan Kumar <
> > > ka...@apache.org>
> > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Like the plan of having a hadoop profile, not
> shipping
> > > it a
> > > > > > part
> > > > > > > of the
> > > > > > > > > > > > apache release and then we can eventually remove it
> in
> > a
> > > > > > release
> > > > > > > or 2 .
> > > > > > > > > > > > Does that work for you folks Maytas, Lucas ?
> > > > > > > > > > > >
> > > > > > > > > > > > On Mon, Apr 7, 2025 at 3:59 PM Zoltan Haindrich <
> > > k...@rxd.hu
> > > > > >
> > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > >> Hey,
> > > > > > > > > > > >>
> > > > > > > > > > > >> I was also bumping into this while I was running
> > > > > > > dependency-checks for
> > > > > > > > > > > >> Druid-33
> > > > > > > > > > > >> * I've  encountered a CVE [1] in
> hadoop-runtime-3.3.6
> > > which
> > > > > > is a
> > > > > > > > > > shaded
> > > > > > > > > > > >> jar
> > > > > > > > > > > >> * we have a PR to upgrade to 3.4.0 ; so I checked
> also
> > > > > 3.4.1 -
> > > > > > > but
> > > > > > > > > > they
> > > > > > > > > > > >> are also affected as they ship with (jetty is
> > > > > > 9.4.53.v20231009)
> > > > > > > [2]
> > > > > > > > > > > >>
> > > > > > > > > > > >> ..so right now there is no normal way to solve this
> -
> > > the
> > > > > fact
> > > > > > > that
> > > > > > > > > > its
> > > > > > > > > > > a
> > > > > > > > > > > >> shaded jar further complicates things..
> > > > > > > > > > > >>
> > > > > > > > > > > >> Note: the trunk Hadoop uses jetty 9.4.57 [3] - which
> > is
> > > > > good;
> > > > > > > so there
> > > > > > > > > > > >> will be some future version which might be not
> > affected
> > > > > > > > > > > >> I wanted to be thorough and digged into a few
> things -
> > > to
> > > > > see
> > > > > > > how soon
> > > > > > > > > > > an
> > > > > > > > > > > >> updated version may come out:
> > > > > > > > > > > >> * there are a 300+ tickets targeted for 3.5.0 .. so
> > that
> > > > > > > doesn't looks
> > > > > > > > > > > >> promising
> > > > > > > > > > > >> * but even for 3.4.2 there is a huge jira [4] with
> 159
> > > > > > subtasks
> > > > > > > out of
> > > > > > > > > > > >> which 123 is unassigned...
> > > > > > > > > > > >>    if that's really needed for 3.4.2 then I doubt
> > > they'll be
> > > > > > > rolling
> > > > > > > > > > out
> > > > > > > > > > > >> a release soon...
> > > > > > > > > > > >> * I was also peeking into jdk17 jiras which will
> most
> > > likely
> > > > > > > arrive in
> > > > > > > > > > > >> 3.5.0 [5]
> > > > > > > > > > > >>
> > > > > > > > > > > >> Keeping Hadoop like this will hold us back from:
> > > > > > > > > > > >> * upgrading 3rd party deps
> > > > > > > > > > > >> * forces us to add security supressions
> > > > > > > > > > > >> * slows down newer jdk adoption - as officially
> hadoop
> > > only
> > > > > > > supports
> > > > > > > > > > 11
> > > > > > > > > > > >>
> > > > > > > > > > > >> I think most of the companies using Hadoop are
> > utilizing
> > > > > > > binaries
> > > > > > > > > > which
> > > > > > > > > > > >> are being built from forks - and they also have the
> > > > > > > ability&bandwidth
> > > > > > > > > > to
> > > > > > > > > > > >> fix these 3rd party
> > > > > > > > > > > >> libraries...
> > > > > > > > > > > >> I would also guess that they might be also using a
> > > custom
> > > > > > built
> > > > > > > Druid
> > > > > > > > > > -
> > > > > > > > > > > >> and as a result: they have more control over what
> kind
> > > of
> > > > > > > features
> > > > > > > > > > they
> > > > > > > > > > > >> have or not.
> > > > > > > > > > > >>
> > > > > > > > > > > >> So I was wondering about the following:
> > > > > > > > > > > >> * add a maven profile for hadoop support (defaults
> to
> > > off)
> > > > > > > > > > > >> * retain compaibility: during CI runs: build with
> > jdk11
> > > and
> > > > > > run
> > > > > > > all
> > > > > > > > > > > >> hadoop tests
> > > > > > > > > > > >> * future releases (>=34) would ship w/o hadoop
> > ingestion
> > > > > > > > > > > >> * companies using hadoop-ingestion could turn on the
> > > profile
> > > > > > > and use
> > > > > > > > > > it
> > > > > > > > > > > >>
> > > > > > > > > > > >> What do you guys think?
> > > > > > > > > > > >>
> > > > > > > > > > > >> cheers,
> > > > > > > > > > > >> Zoltan
> > > > > > > > > > > >>
> > > > > > > > > > > >>
> > > > > > > > > > > >> [1] https://nvd.nist.gov/vuln/detail/cve-2024-22201
> > > > > > > > > > > >> [2]
> > > > > > > > > > > >>
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> https://github.com/apache/hadoop/blob/626b227094027ed08883af97a0734d2db7863864/hadoop-project/pom.xml#L40
> > > > > > > > > > > >> [3]
> > > > > > > > > > > >>
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> https://github.com/apache/hadoop/blob/3d2f4d669edcf321509ceacde58a8160aef06a8c/hadoop-project/pom.xml#L40
> > > > > > > > > > > >> [4]
> > https://issues.apache.org/jira/browse/HADOOP-19353
> > > > > > > > > > > >> [5]
> > https://issues.apache.org/jira/browse/HADOOP-17177
> > > > > > > > > > > >>
> > > > > > > > > > > >>
> > > > > > > > > > > >> On 1/8/25 11:56, Abhishek Agarwal wrote:
> > > > > > > > > > > >> > @Adarsh - FYI since you are the release manager
> for
> > > 32.
> > > > > > > > > > > >> >
> > > > > > > > > > > >> > On Wed, Jan 8, 2025 at 11:53 AM Abhishek Agarwal <
> > > > > > > > > > abhis...@apache.org
> > > > > > > > > > > >
> > > > > > > > > > > >> > wrote:
> > > > > > > > > > > >> >
> > > > > > > > > > > >> >> I don't want to kick that can too far down the
> road
> > > > > either
> > > > > > > :) We
> > > > > > > > > > > don't
> > > > > > > > > > > >> >> want to give a false hope that it's going to
> remain
> > > > > around
> > > > > > > forever.
> > > > > > > > > > > >> But yes
> > > > > > > > > > > >> >> let's deprecate both Hadoop and Java 11 support
> in
> > > the
> > > > > > > upcoming 32
> > > > > > > > > > > >> release.
> > > > > > > > > > > >> >> It's unfortunate that Hadoop still doesn't
> support
> > > Java
> > > > > 17.
> > > > > > > We
> > > > > > > > > > > >> shouldn't
> > > > > > > > > > > >> >> let it hold us back. Jetty, pac4j are dropping
> Java
> > > 11
> > > > > > > support and
> > > > > > > > > > we
> > > > > > > > > > > >> would
> > > > > > > > > > > >> >> want to upgrade to newer versions of these
> > > dependencies
> > > > > > > soon. There
> > > > > > > > > > > are
> > > > > > > > > > > >> >> also nice language features in Java 17 such as
> > > pattern
> > > > > > > matching,
> > > > > > > > > > > >> multiline
> > > > > > > > > > > >> >> strings, and a lot more that we can't use if we
> > have
> > > to
> > > > > be
> > > > > > > compile
> > > > > > > > > > > >> >> compatible with Java 11. If you need the resource
> > > > > > elasticity
> > > > > > > that
> > > > > > > > > > > >> Hadoop
> > > > > > > > > > > >> >> provides or want to reuse shared infrastructure
> in
> > > the
> > > > > > > company,
> > > > > > > > > > > MM-less
> > > > > > > > > > > >> >> ingestion is a good alternative.
> > > > > > > > > > > >> >>
> > > > > > > > > > > >> >> So let's deprecate it in 32. We can decide on
> > removal
> > > > > later
> > > > > > > but
> > > > > > > > > > > >> hopefully,
> > > > > > > > > > > >> >> it doesn't take too many releases to do that.
> > > > > > > > > > > >> >>
> > > > > > > > > > > >> >> On Tue, Jan 7, 2025 at 4:22 PM Karan Kumar <
> > > > > > ka...@apache.org
> > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > > > >> >>
> > > > > > > > > > > >> >>> Okay from what I can gather few folks still need
> > > hadoop
> > > > > > > ingestion.
> > > > > > > > > > > So
> > > > > > > > > > > >> >>> let's
> > > > > > > > > > > >> >>> kick the can down the road regarding removal of
> > that
> > > > > > > support but
> > > > > > > > > > > let's
> > > > > > > > > > > >> >>> agree on the deprecation plan. Since druid 32 is
> > > around
> > > > > > the
> > > > > > > corner
> > > > > > > > > > > >> let's
> > > > > > > > > > > >> >>> atleast deprecated hadoop ingestion so that any
> > new
> > > > > users
> > > > > > > are not
> > > > > > > > > > > >> >>> onboarded
> > > > > > > > > > > >> >>> to this way of ingestion. Deprecation also
> > becomes a
> > > > > > forcing
> > > > > > > > > > > function
> > > > > > > > > > > >> in
> > > > > > > > > > > >> >>> internal company channel's for prioritization of
> > > getting
> > > > > > off
> > > > > > > > > > hadoop.
> > > > > > > > > > > >> >>>
> > > > > > > > > > > >> >>> How does this plan look?
> > > > > > > > > > > >> >>>
> > > > > > > > > > > >> >>> On Fri, Dec 13, 2024 at 1:11 AM Maytas
> > > Monsereenusorn <
> > > > > > > > > > > >> mayt...@apache.org
> > > > > > > > > > > >> >>>>
> > > > > > > > > > > >> >>> wrote:
> > > > > > > > > > > >> >>>
> > > > > > > > > > > >> >>>> We at Netflix are in a similar situation to
> > Target
> > > > > > > Corporation
> > > > > > > > > > > >> (Lucas C
> > > > > > > > > > > >> >>>> email above).
> > > > > > > > > > > >> >>>> We currently rely on Hadoop ingestion for all
> our
> > > batch
> > > > > > > ingestion
> > > > > > > > > > > >> jobs.
> > > > > > > > > > > >> >>> The
> > > > > > > > > > > >> >>>> main reason for this is that we already have a
> > > large
> > > > > > Hadoop
> > > > > > > > > > cluster
> > > > > > > > > > > >> >>>> supporting our Spark workloads that we can
> > > leverage for
> > > > > > > Druid
> > > > > > > > > > > >> >>> ingestion. I
> > > > > > > > > > > >> >>>> imagine that the closest alternative for us
> would
> > > be
> > > > > > > moving to
> > > > > > > > > > K8 /
> > > > > > > > > > > >> >>>> MiddleManager-less ingestion job.
> > > > > > > > > > > >> >>>>
> > > > > > > > > > > >> >>>> On Thu, Dec 12, 2024 at 10:56 PM Lucas
> > Capistrant <
> > > > > > > > > > > >> >>>> capistrant.lu...@gmail.com> wrote:
> > > > > > > > > > > >> >>>>
> > > > > > > > > > > >> >>>>> Apologies for the empty email… fat fingers.
> > > > > > > > > > > >> >>>>>
> > > > > > > > > > > >> >>>>> Just wanted to say that we at Target
> Corporation
> > > > > (USA),
> > > > > > > still
> > > > > > > > > > rely
> > > > > > > > > > > >> >>>> heavily
> > > > > > > > > > > >> >>>>> on Hadoop ingest. We’d selfishly want support
> > > forever,
> > > > > > > but if
> > > > > > > > > > > forced
> > > > > > > > > > > >> >>> to
> > > > > > > > > > > >> >>>>> pivot to a new ingestion style for our larger
> > > batch
> > > > > > > ingest jobs
> > > > > > > > > > > that
> > > > > > > > > > > >> >>>>> currently leverage the cheap compute on YARN,
> > the
> > > > > longer
> > > > > > > the
> > > > > > > > > > lead
> > > > > > > > > > > >> time
> > > > > > > > > > > >> >>>>> between announcement by the community to the
> > > actual
> > > > > > > release with
> > > > > > > > > > > no
> > > > > > > > > > > >> >>>>> support, the better. Making these types of
> > > changes can
> > > > > > be
> > > > > > > a slow
> > > > > > > > > > > >> >>> process
> > > > > > > > > > > >> >>>>> for the slow to maneuver corporate cruise
> ship.
> > > > > > > > > > > >> >>>>>
> > > > > > > > > > > >> >>>>> On Thu, Dec 12, 2024 at 9:46 AM Lucas
> > Capistrant <
> > > > > > > > > > > >> >>>>> capistrant.lu...@gmail.com>
> > > > > > > > > > > >> >>>>> wrote:
> > > > > > > > > > > >> >>>>>
> > > > > > > > > > > >> >>>>>>
> > > > > > > > > > > >> >>>>>>
> > > > > > > > > > > >> >>>>>> On Wed, Dec 11, 2024 at 9:10 PM Karan Kumar <
> > > > > > > ka...@apache.org>
> > > > > > > > > > > >> >>> wrote:
> > > > > > > > > > > >> >>>>>>
> > > > > > > > > > > >> >>>>>>> +1 for removal of Hadoop based ingestion.
> > It's a
> > > > > > > maintenance
> > > > > > > > > > > >> >>> overhead
> > > > > > > > > > > >> >>>>> and
> > > > > > > > > > > >> >>>>>>> stops us from moving to java 17.
> > > > > > > > > > > >> >>>>>>> I am not aware of any gaps in sql based
> > > ingestion
> > > > > > which
> > > > > > > limits
> > > > > > > > > > > >> >>> users
> > > > > > > > > > > >> >>>> to
> > > > > > > > > > > >> >>>>>>> move off from hadoop. If there are any,
> please
> > > feel
> > > > > > > free to
> > > > > > > > > > > reach
> > > > > > > > > > > >> >>> out
> > > > > > > > > > > >> >>>>> via
> > > > > > > > > > > >> >>>>>>> slack/github.
> > > > > > > > > > > >> >>>>>>>
> > > > > > > > > > > >> >>>>>>> On Thu, Dec 12, 2024 at 3:22 AM Clint Wylie
> <
> > > > > > > > > > cwy...@apache.org>
> > > > > > > > > > > >> >>>> wrote:
> > > > > > > > > > > >> >>>>>>>
> > > > > > > > > > > >> >>>>>>>> Hey everyone,
> > > > > > > > > > > >> >>>>>>>>
> > > > > > > > > > > >> >>>>>>>> It is about that time again to take a pulse
> > on
> > > how
> > > > > > > commonly
> > > > > > > > > > > >> >>> Hadoop
> > > > > > > > > > > >> >>>>>>>> based ingestion is used with Druid in order
> > to
> > > > > > > determine if
> > > > > > > > > > we
> > > > > > > > > > > >> >>>> should
> > > > > > > > > > > >> >>>>>>>> keep supporting it or not going forward.
> > > > > > > > > > > >> >>>>>>>>
> > > > > > > > > > > >> >>>>>>>> In my view, Hadoop based ingestion has
> > > unofficially
> > > > > > > been on
> > > > > > > > > > > life
> > > > > > > > > > > >> >>>>>>>> support for quite some time as we do not
> > > really go
> > > > > > out
> > > > > > > of our
> > > > > > > > > > > >> >>> way to
> > > > > > > > > > > >> >>>>>>>> add new features to it, and we perform very
> > > minimal
> > > > > > > testing
> > > > > > > > > > to
> > > > > > > > > > > >> >>>> ensure
> > > > > > > > > > > >> >>>>>>>> everything keeps working. The most recent
> > > changes
> > > > > to
> > > > > > > it I am
> > > > > > > > > > > >> >>> aware
> > > > > > > > > > > >> >>>> of
> > > > > > > > > > > >> >>>>>>>> was to bump versions and require Hadoop 3,
> > but
> > > that
> > > > > > was
> > > > > > > > > > > primarily
> > > > > > > > > > > >> >>>>>>>> motivated by selfish reasons of wanting to
> > use
> > > its
> > > > > > > contained
> > > > > > > > > > > >> >>> client
> > > > > > > > > > > >> >>>>>>>> library and better isolation so that we
> could
> > > free
> > > > > up
> > > > > > > our own
> > > > > > > > > > > >> >>>>>>>> dependencies to be updated. This thread is
> > > > > motivated
> > > > > > > by a
> > > > > > > > > > > similar
> > > > > > > > > > > >> >>>>>>>> reason I guess, see the other thread I
> > started
> > > > > > recently
> > > > > > > > > > > >> >>> discussing
> > > > > > > > > > > >> >>>>>>>> dropping support for Java 11 where Hadoop
> > does
> > > not
> > > > > > yet
> > > > > > > > > > support
> > > > > > > > > > > >> >>> Java
> > > > > > > > > > > >> >>>> 17
> > > > > > > > > > > >> >>>>>>>> runtime, and so the outcome of this
> > discussion
> > > is
> > > > > > > involved in
> > > > > > > > > > > >> >>> those
> > > > > > > > > > > >> >>>>>>>> plans.
> > > > > > > > > > > >> >>>>>>>>
> > > > > > > > > > > >> >>>>>>>> I think SQL based ingestion with the
> > > multi-stage
> > > > > > query
> > > > > > > engine
> > > > > > > > > > > is
> > > > > > > > > > > >> >>> the
> > > > > > > > > > > >> >>>>>>>> future of batch ingestion, and the
> Kubernetes
> > > based
> > > > > > > task
> > > > > > > > > > runner
> > > > > > > > > > > >> >>>>>>>> provides an alternative for task auto
> scaling
> > > > > > > capabilities.
> > > > > > > > > > > >> >>> Because
> > > > > > > > > > > >> >>>> of
> > > > > > > > > > > >> >>>>>>>> this, I don't personally see a lot of
> > > compelling
> > > > > > > reasons to
> > > > > > > > > > > keep
> > > > > > > > > > > >> >>>>>>>> supporting Hadoop, so I would be in favor
> of
> > > just
> > > > > > > dropping
> > > > > > > > > > > >> >>> support
> > > > > > > > > > > >> >>>> for
> > > > > > > > > > > >> >>>>>>>> it completely, though I see no harm in
> > keeping
> > > HDFS
> > > > > > > deep
> > > > > > > > > > > storage
> > > > > > > > > > > >> >>>>>>>> around. In past discussions I think we had
> > tied
> > > > > > Hadoop
> > > > > > > > > > removal
> > > > > > > > > > > to
> > > > > > > > > > > >> >>>>>>>> adding something like Spark to replace it,
> > but
> > > I
> > > > > > > wonder if
> > > > > > > > > > this
> > > > > > > > > > > >> >>>> still
> > > > > > > > > > > >> >>>>>>>> needs to be the case.
> > > > > > > > > > > >> >>>>>>>>
> > > > > > > > > > > >> >>>>>>>> I do know that classically there have been
> > > quite a
> > > > > > lot
> > > > > > > of
> > > > > > > > > > large
> > > > > > > > > > > >> >>>> Druid
> > > > > > > > > > > >> >>>>>>>> clusters in the wild still relying on
> Hadoop
> > in
> > > > > > > previous dev
> > > > > > > > > > > list
> > > > > > > > > > > >> >>>>>>>> discussions about this topic, so I wanted
> to
> > > check
> > > > > to
> > > > > > > see if
> > > > > > > > > > > >> >>> this is
> > > > > > > > > > > >> >>>>>>>> still true and if so if any of these
> clusters
> > > have
> > > > > > > plans to
> > > > > > > > > > > >> >>>> transition
> > > > > > > > > > > >> >>>>>>>> to newer ways of ingesting data like SQL
> > based
> > > > > > > ingestion.
> > > > > > > > > > While
> > > > > > > > > > > >> >>>> from a
> > > > > > > > > > > >> >>>>>>>> dev/maintenance perspective it would be
> best
> > to
> > > > > just
> > > > > > > drop it
> > > > > > > > > > > >> >>>>>>>> completely, if there is still a large user
> > > base I
> > > > > > > think we
> > > > > > > > > > need
> > > > > > > > > > > >> >>> to
> > > > > > > > > > > >> >>>> be
> > > > > > > > > > > >> >>>>>>>> open to keeping it around for a while
> longer.
> > > If we
> > > > > > do
> > > > > > > need
> > > > > > > > > > to
> > > > > > > > > > > >> >>> keep
> > > > > > > > > > > >> >>>>>>>> it, maybe it would be worth it to invest
> some
> > > time
> > > > > in
> > > > > > > moving
> > > > > > > > > > it
> > > > > > > > > > > >> >>>> into a
> > > > > > > > > > > >> >>>>>>>> contrib extension so that it isn't bundled
> by
> > > > > default
> > > > > > > with
> > > > > > > > > > > Druid
> > > > > > > > > > > >> >>>>>>>> releases to discourage new adoption and
> more
> > > > > > accurately
> > > > > > > > > > reflect
> > > > > > > > > > > >> >>> its
> > > > > > > > > > > >> >>>>>>>> current status in Druid.
> > > > > > > > > > > >> >>>>>>>>
> > > > > > > > > > > >> >>>>>>>>
> > > > > > > > > > > >> >>>>
> > > > > > > > > > >
> > > > > > >
> > > ---------------------------------------------------------------------
> > > > > > > > > > > >> >>>>>>>> To unsubscribe, e-mail:
> > > > > > > dev-unsubscr...@druid.apache.org
> > > > > > > > > > > >> >>>>>>>> For additional commands, e-mail:
> > > > > > > dev-h...@druid.apache.org
> > > > > > > > > > > >> >>>>>>>>
> > > > > > > > > > > >> >>>>>>>>
> > > > > > > > > > > >> >>>>>>>
> > > > > > > > > > > >> >>>>>>
> > > > > > > > > > > >> >>>>>
> > > > > > > > > > > >> >>>>
> > > > > > > > > > > >> >>>
> > > > > > > > > > > >> >>
> > > > > > > > > > > >> >
> > > > > > > > > > > >>
> > > > > > > > > > > >>
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > ---------------------------------------------------------------------
> > > > > > > > To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
> > > > > > > > For additional commands, e-mail: dev-h...@druid.apache.org
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > ---------------------------------------------------------------------
> > > > > > > To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
> > > > > > > For additional commands, e-mail: dev-h...@druid.apache.org
> > > > > > >
> > > > > > >
> > > > > >
> > > > > > --
> > > > > >
> > > > > > Best regards,
> > > > > > Eyal Yurman
> > > > > >
> > > > >
> > > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
> > > For additional commands, e-mail: dev-h...@druid.apache.org
> > >
> > >
> >
> > --
> >
> > Best regards,
> > Eyal Yurman
> >
>

Re: [DISCUSS] Hadoop ingestion support

Reply via email to