Re: [DISCUSS] Hadoop ingestion support

Karan Kumar Tue, 01 Jul 2025 20:11:53 -0700

I have not heard any plans about deprecating `*druid-hdfs-storage`* which
serves as a deep storage implementation. This thread is strictly
about hadoop *ingestion* support.


On Wed, Jul 2, 2025 at 8:28 AM Eyal Yurman <eyal.yur...@gmail.com> wrote:

> Druid also includes druid-hdfs-storage core extension which bundles
> hadoop-client-api.
>
> I assume there isn't a plan to deprecate this extension?
>
> On Tue, Jul 1, 2025 at 8:19 AM Gian Merlino <g...@apache.org> wrote:
>
> > We are in a tough situation where our hand is being forced on dropping
> > Java 11 support by the Jetty 9 EOL situation. It isn't a good idea to
> > continue using Jetty 9 given it's no longer receiving security updates,
> and
> > Jetty 12 (the only currently-supported version) requires Java 17.
> >
> > However, the main thing pushing us to drop Hadoop-based ingestion support
> > is the fact that Hadoop doesn't support Java 17. If we can find a way to
> > keep it working even with the Jetty 12 + Java 17 upgrade, then we don't
> > necessarily need to drop the support immediately. To that end I suggest
> the
> > following approach:
> >
> > 1) In Druid 34 (next upcoming release, code freeze in a week or so)
> > announce that Java 11 support and Hadoop-based ingestion are both
> > deprecated. For Hadoop-based ingestion, provide guidance on migrating to
> > SQL-based ingestion (as a Map/Reduce replacement) and the k8s task runner
> > (as a YARN replacement).
> >
> > 2) In Druid 35 (~October) bump up the minimum Java version to Java 17,
> and
> > upgrade to Jetty 12. Try to keep Hadoop-based ingestion working by
> > continuing to target Java 11 when we compile our own code, and avoiding
> > usage of Java-17-requiring libraries on the ingestion code path. (I
> believe
> > Jetty is not used on the ingestion code path.) We may be able to continue
> > supporting Hadoop-based ingestion in this way. If we can- great. If we
> > can't- we would need to remove support for Hadoop-based ingestion in this
> > version.
> >
> > If we manage to get Druid 35 working with Hadoop-based ingestion, that
> > situation could continue for some time. At some point, something else may
> > force our hand- perhaps a critical library on the ingestion path will
> begin
> > to require Java 17. We would see how it plays out.
> >
> > Thoughts?
> >
> > Gian
> >
> > On 2025/06/23 20:14:39 Lucas Capistrant wrote:
> > > Thanks for your input from Roku user point of view, Krishna. We are
> > > definitely in a tough spot here because of Hadoop support preventing us
> > > from dropping Java 11 support. And then the domino effect being we
> can’t
> > > upgrade off of EOL dependencies such as Jetty 9.
> > >
> > > In the Java 11 support discussion,
> > > https://lists.apache.org/thread/bvkztwoyy35mvyqkccp87zrfd68sqqkw, we
> > > discuss the risk of supporting Java 11 beyond Druid 34. I think the
> > biggest
> > > worry is that we are going to get caught in a situation where a patch
> fix
> > > for a CVE could require dropping Java 11 and Hadoop support in a patch
> > > release because resolving the CVE requires dependency upgrades that
> don’t
> > > support 11. Delaying dropping support until Druid 36 makes it all the
> > more
> > > likely that we run into that situation.
> > >
> > > If we were to drop Hadoop ingest support in October as a part of Druid
> > 35,
> > > would there be a clear path forward for your Druid deployments?
> Assuming
> > > the community provides a solid migration plan for open source users
> > > regarding Hadoop ingestion alternatives.
> > >
> > > Also, if there is a path to supporting Hadoop ingestion as a contrib
> > > extension and someone in the community wanted to carry the torch on its
> > > development, that is definitely a possibility as well. Though, I’m not
> > sure
> > > that anyone has scoped out how much work that would be, or if it’s even
> > > possible to achieve.
> > >
> > > Thanks,
> > > Lucas
> > >
> > > On Wed, Jun 18, 2025 at 6:50 PM Krishna Thirumalasetty <
> > kthir...@gmail.com>
> > > wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > Adding to the voices from Netflix and Target — at Roku Inc., we also
> > rely
> > > > heavily on Hadoop-based batch ingestion for a significant portion of
> > our
> > > > Druid datasources. This approach allows us to leverage our existing
> > Hadoop
> > > > infrastructure efficiently and cost-effectively for large-scale batch
> > > > processing.
> > > >
> > > > If the community decides to move forward with the removal of Hadoop
> > > > ingestion support, it would likely force us to remain on an older
> > version
> > > > of Druid for some time. This is not ideal, as it would prevent us
> from
> > > > benefiting from ongoing improvements, security updates, and newer
> > features
> > > > in the Druid ecosystem.
> > > >
> > > > That said, we fully understand and support the broader goals of
> > modernizing
> > > > the Druid platform, reducing tech debt, and enabling the use of more
> > > > current Java features and dependency upgrades. Given these competing
> > > > priorities, we believe the best path forward would be:
> > > >
> > > >    - *Clear deprecation communication* in Druid 32, discouraging new
> > > >    adoption while giving teams time to react.
> > > >    - *An official target removal date*, such as in Druid 36 (early
> > 2026),
> > > >    which provides adequate lead time for organizations like ours to
> > > > evaluate
> > > >    alternatives and begin planning migrations.
> > > >    - *Consideration of keeping the Hadoop ingestion module as a
> contrib
> > > >    extension*, or at least providing a supported migration path with
> > > >    documentation to MM-less ingestion or other batch ingestion
> > > > alternatives.
> > > >
> > > > This approach would help companies like Roku manage the transition
> in a
> > > > predictable and structured way, while also empowering the Druid
> > community
> > > > to move forward with more agility.
> > > >
> > > > Thanks for raising this important discussion.
> > > >
> > > > Best,
> > > > Krishna Thirumalasetty
> > > > Roku Inc.
> > > >
> > > > On Tue, Jun 17, 2025 at 3:28 PM Eyal Yurman <eyal.yur...@gmail.com>
> > wrote:
> > > >
> > > > > Sharing as another data point -
> > > > >
> > > > > We still use YARN to run Hadoop-based batch ingestion. Very useful
> > > > > on-premise for resource sharing, where autoscaling isn't always an
> > > > option.
> > > > > But we plan to move to Kubernetes for ingestion sometime next year.
> > > > >
> > > > >
> > > > > On Tue, Jun 17, 2025 at 12:20 PM Gian Merlino <g...@apache.org>
> > wrote:
> > > > >
> > > > > > I'm on board with this. I also think we should deprecate it ASAP,
> > > > > starting
> > > > > > in the next major release. It'd be nice to also build a migration
> > guide
> > > > > > that helps people move from Hadoop ingestion to SQL/MSQ
> ingestion,
> > and
> > > > > from
> > > > > > YARN to K8S pod runners.
> > > > > >
> > > > > > Gian
> > > > > >
> > > > > > On 2025/06/09 20:10:03 Clint Wylie wrote:
> > > > > > > Following up on this, I want to propose the first release of
> > 2026 for
> > > > > > > removal, which I think would be Druid 36, to give some lead
> time
> > for
> > > > > > > those affected to prepare.
> > > > > > >
> > > > > > > On Wed, Apr 9, 2025 at 8:42 AM Frank Chen <
> frankc...@apache.org>
> > > > > wrote:
> > > > > > > >
> > > > > > > > We don't use Hadoop ingestion, it's OK for us to drop the
> > support
> > > > of
> > > > > > Hadoop.
> > > > > > > >
> > > > > > > > We can make an announcement to deprecate it first(from 33?),
> > remove
> > > > > it
> > > > > > from
> > > > > > > > official distribution( but keep the ability to build it as
> > above
> > > > > > suggested,
> > > > > > > > from 34?),
> > > > > > > > and remove it completely at a proper time.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Wed, Apr 9, 2025 at 5:02 AM Maytas Monsereenusorn <
> > > > > > mayt...@apache.org>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > I'm in favor of removing too but we should not rush the
> > removal
> > > > and
> > > > > > make
> > > > > > > > > sure we give enough time for users to migrate to other
> types
> > of
> > > > > > ingestion.
> > > > > > > > > Similar to what Lucas said, if Hadoop is holding back Druid
> > then
> > > > we
> > > > > > should
> > > > > > > > > remove it. Druid also supports many other types of
> ingestion
> > > > > > compared to
> > > > > > > > > back when Hadoop ingestion was added.
> > > > > > > > > For Netflix, we will be migrating to MM-less Druid
> ingestion
> > in
> > > > > K8s.
> > > > > > I
> > > > > > > > > think MM-less Druid ingestion in K8s is probably the
> closest
> > to
> > > > > > Hadoop
> > > > > > > > > ingestion as we do not have to maintain a dedicated Druid
> > > > specific
> > > > > MM
> > > > > > > > > cluster (works well for companies with existing
> large/shared
> > > > > Compute
> > > > > > > > > clusters). Personally, I feel we should focus our energy on
> > > > things
> > > > > > > > > like MM-less Druid in K8s (which is still marked as
> > Experimental)
> > > > > > rather
> > > > > > > > > than Hadoop.
> > > > > > > > >
> > > > > > > > > Best Regards,
> > > > > > > > > Maytas
> > > > > > > > >
> > > > > > > > > On Tue, Apr 8, 2025 at 4:06 AM Lucas Capistrant <
> > > > > > > > > capistrant.lu...@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Yes, I’m in favor of removing it from the core release
> and
> > also
> > > > > in
> > > > > > favor
> > > > > > > > > of
> > > > > > > > > > officially announcing deprecation with a timeline for
> > removal,
> > > > if
> > > > > > we have
> > > > > > > > > > not yet. It stinks to lose the Hadoop ingest support, but
> > if
> > > > that
> > > > > > project
> > > > > > > > > > is going to hold back Druid, it seems we don’t have much
> > > > choice.
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > Lucas
> > > > > > > > > >
> > > > > > > > > > On Tue, Apr 8, 2025 at 4:27 AM Karan Kumar <
> > ka...@apache.org>
> > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Like the plan of having a hadoop profile, not shipping
> > it a
> > > > > part
> > > > > > of the
> > > > > > > > > > > apache release and then we can eventually remove it in
> a
> > > > > release
> > > > > > or 2 .
> > > > > > > > > > > Does that work for you folks Maytas, Lucas ?
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Apr 7, 2025 at 3:59 PM Zoltan Haindrich <
> > k...@rxd.hu
> > > > >
> > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > >> Hey,
> > > > > > > > > > >>
> > > > > > > > > > >> I was also bumping into this while I was running
> > > > > > dependency-checks for
> > > > > > > > > > >> Druid-33
> > > > > > > > > > >> * I've  encountered a CVE [1] in hadoop-runtime-3.3.6
> > which
> > > > > is a
> > > > > > > > > shaded
> > > > > > > > > > >> jar
> > > > > > > > > > >> * we have a PR to upgrade to 3.4.0 ; so I checked also
> > > > 3.4.1 -
> > > > > > but
> > > > > > > > > they
> > > > > > > > > > >> are also affected as they ship with (jetty is
> > > > > 9.4.53.v20231009)
> > > > > > [2]
> > > > > > > > > > >>
> > > > > > > > > > >> ..so right now there is no normal way to solve this -
> > the
> > > > fact
> > > > > > that
> > > > > > > > > its
> > > > > > > > > > a
> > > > > > > > > > >> shaded jar further complicates things..
> > > > > > > > > > >>
> > > > > > > > > > >> Note: the trunk Hadoop uses jetty 9.4.57 [3] - which
> is
> > > > good;
> > > > > > so there
> > > > > > > > > > >> will be some future version which might be not
> affected
> > > > > > > > > > >> I wanted to be thorough and digged into a few things -
> > to
> > > > see
> > > > > > how soon
> > > > > > > > > > an
> > > > > > > > > > >> updated version may come out:
> > > > > > > > > > >> * there are a 300+ tickets targeted for 3.5.0 .. so
> that
> > > > > > doesn't looks
> > > > > > > > > > >> promising
> > > > > > > > > > >> * but even for 3.4.2 there is a huge jira [4] with 159
> > > > > subtasks
> > > > > > out of
> > > > > > > > > > >> which 123 is unassigned...
> > > > > > > > > > >>    if that's really needed for 3.4.2 then I doubt
> > they'll be
> > > > > > rolling
> > > > > > > > > out
> > > > > > > > > > >> a release soon...
> > > > > > > > > > >> * I was also peeking into jdk17 jiras which will most
> > likely
> > > > > > arrive in
> > > > > > > > > > >> 3.5.0 [5]
> > > > > > > > > > >>
> > > > > > > > > > >> Keeping Hadoop like this will hold us back from:
> > > > > > > > > > >> * upgrading 3rd party deps
> > > > > > > > > > >> * forces us to add security supressions
> > > > > > > > > > >> * slows down newer jdk adoption - as officially hadoop
> > only
> > > > > > supports
> > > > > > > > > 11
> > > > > > > > > > >>
> > > > > > > > > > >> I think most of the companies using Hadoop are
> utilizing
> > > > > > binaries
> > > > > > > > > which
> > > > > > > > > > >> are being built from forks - and they also have the
> > > > > > ability&bandwidth
> > > > > > > > > to
> > > > > > > > > > >> fix these 3rd party
> > > > > > > > > > >> libraries...
> > > > > > > > > > >> I would also guess that they might be also using a
> > custom
> > > > > built
> > > > > > Druid
> > > > > > > > > -
> > > > > > > > > > >> and as a result: they have more control over what kind
> > of
> > > > > > features
> > > > > > > > > they
> > > > > > > > > > >> have or not.
> > > > > > > > > > >>
> > > > > > > > > > >> So I was wondering about the following:
> > > > > > > > > > >> * add a maven profile for hadoop support (defaults to
> > off)
> > > > > > > > > > >> * retain compaibility: during CI runs: build with
> jdk11
> > and
> > > > > run
> > > > > > all
> > > > > > > > > > >> hadoop tests
> > > > > > > > > > >> * future releases (>=34) would ship w/o hadoop
> ingestion
> > > > > > > > > > >> * companies using hadoop-ingestion could turn on the
> > profile
> > > > > > and use
> > > > > > > > > it
> > > > > > > > > > >>
> > > > > > > > > > >> What do you guys think?
> > > > > > > > > > >>
> > > > > > > > > > >> cheers,
> > > > > > > > > > >> Zoltan
> > > > > > > > > > >>
> > > > > > > > > > >>
> > > > > > > > > > >> [1] https://nvd.nist.gov/vuln/detail/cve-2024-22201
> > > > > > > > > > >> [2]
> > > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > >
> > > > > >
> > > > >
> > > >
> >
> https://github.com/apache/hadoop/blob/626b227094027ed08883af97a0734d2db7863864/hadoop-project/pom.xml#L40
> > > > > > > > > > >> [3]
> > > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > >
> > > > > >
> > > > >
> > > >
> >
> https://github.com/apache/hadoop/blob/3d2f4d669edcf321509ceacde58a8160aef06a8c/hadoop-project/pom.xml#L40
> > > > > > > > > > >> [4]
> https://issues.apache.org/jira/browse/HADOOP-19353
> > > > > > > > > > >> [5]
> https://issues.apache.org/jira/browse/HADOOP-17177
> > > > > > > > > > >>
> > > > > > > > > > >>
> > > > > > > > > > >> On 1/8/25 11:56, Abhishek Agarwal wrote:
> > > > > > > > > > >> > @Adarsh - FYI since you are the release manager for
> > 32.
> > > > > > > > > > >> >
> > > > > > > > > > >> > On Wed, Jan 8, 2025 at 11:53 AM Abhishek Agarwal <
> > > > > > > > > abhis...@apache.org
> > > > > > > > > > >
> > > > > > > > > > >> > wrote:
> > > > > > > > > > >> >
> > > > > > > > > > >> >> I don't want to kick that can too far down the road
> > > > either
> > > > > > :) We
> > > > > > > > > > don't
> > > > > > > > > > >> >> want to give a false hope that it's going to remain
> > > > around
> > > > > > forever.
> > > > > > > > > > >> But yes
> > > > > > > > > > >> >> let's deprecate both Hadoop and Java 11 support in
> > the
> > > > > > upcoming 32
> > > > > > > > > > >> release.
> > > > > > > > > > >> >> It's unfortunate that Hadoop still doesn't support
> > Java
> > > > 17.
> > > > > > We
> > > > > > > > > > >> shouldn't
> > > > > > > > > > >> >> let it hold us back. Jetty, pac4j are dropping Java
> > 11
> > > > > > support and
> > > > > > > > > we
> > > > > > > > > > >> would
> > > > > > > > > > >> >> want to upgrade to newer versions of these
> > dependencies
> > > > > > soon. There
> > > > > > > > > > are
> > > > > > > > > > >> >> also nice language features in Java 17 such as
> > pattern
> > > > > > matching,
> > > > > > > > > > >> multiline
> > > > > > > > > > >> >> strings, and a lot more that we can't use if we
> have
> > to
> > > > be
> > > > > > compile
> > > > > > > > > > >> >> compatible with Java 11. If you need the resource
> > > > > elasticity
> > > > > > that
> > > > > > > > > > >> Hadoop
> > > > > > > > > > >> >> provides or want to reuse shared infrastructure in
> > the
> > > > > > company,
> > > > > > > > > > MM-less
> > > > > > > > > > >> >> ingestion is a good alternative.
> > > > > > > > > > >> >>
> > > > > > > > > > >> >> So let's deprecate it in 32. We can decide on
> removal
> > > > later
> > > > > > but
> > > > > > > > > > >> hopefully,
> > > > > > > > > > >> >> it doesn't take too many releases to do that.
> > > > > > > > > > >> >>
> > > > > > > > > > >> >> On Tue, Jan 7, 2025 at 4:22 PM Karan Kumar <
> > > > > ka...@apache.org
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > > >> >>
> > > > > > > > > > >> >>> Okay from what I can gather few folks still need
> > hadoop
> > > > > > ingestion.
> > > > > > > > > > So
> > > > > > > > > > >> >>> let's
> > > > > > > > > > >> >>> kick the can down the road regarding removal of
> that
> > > > > > support but
> > > > > > > > > > let's
> > > > > > > > > > >> >>> agree on the deprecation plan. Since druid 32 is
> > around
> > > > > the
> > > > > > corner
> > > > > > > > > > >> let's
> > > > > > > > > > >> >>> atleast deprecated hadoop ingestion so that any
> new
> > > > users
> > > > > > are not
> > > > > > > > > > >> >>> onboarded
> > > > > > > > > > >> >>> to this way of ingestion. Deprecation also
> becomes a
> > > > > forcing
> > > > > > > > > > function
> > > > > > > > > > >> in
> > > > > > > > > > >> >>> internal company channel's for prioritization of
> > getting
> > > > > off
> > > > > > > > > hadoop.
> > > > > > > > > > >> >>>
> > > > > > > > > > >> >>> How does this plan look?
> > > > > > > > > > >> >>>
> > > > > > > > > > >> >>> On Fri, Dec 13, 2024 at 1:11 AM Maytas
> > Monsereenusorn <
> > > > > > > > > > >> mayt...@apache.org
> > > > > > > > > > >> >>>>
> > > > > > > > > > >> >>> wrote:
> > > > > > > > > > >> >>>
> > > > > > > > > > >> >>>> We at Netflix are in a similar situation to
> Target
> > > > > > Corporation
> > > > > > > > > > >> (Lucas C
> > > > > > > > > > >> >>>> email above).
> > > > > > > > > > >> >>>> We currently rely on Hadoop ingestion for all our
> > batch
> > > > > > ingestion
> > > > > > > > > > >> jobs.
> > > > > > > > > > >> >>> The
> > > > > > > > > > >> >>>> main reason for this is that we already have a
> > large
> > > > > Hadoop
> > > > > > > > > cluster
> > > > > > > > > > >> >>>> supporting our Spark workloads that we can
> > leverage for
> > > > > > Druid
> > > > > > > > > > >> >>> ingestion. I
> > > > > > > > > > >> >>>> imagine that the closest alternative for us would
> > be
> > > > > > moving to
> > > > > > > > > K8 /
> > > > > > > > > > >> >>>> MiddleManager-less ingestion job.
> > > > > > > > > > >> >>>>
> > > > > > > > > > >> >>>> On Thu, Dec 12, 2024 at 10:56 PM Lucas
> Capistrant <
> > > > > > > > > > >> >>>> capistrant.lu...@gmail.com> wrote:
> > > > > > > > > > >> >>>>
> > > > > > > > > > >> >>>>> Apologies for the empty email… fat fingers.
> > > > > > > > > > >> >>>>>
> > > > > > > > > > >> >>>>> Just wanted to say that we at Target Corporation
> > > > (USA),
> > > > > > still
> > > > > > > > > rely
> > > > > > > > > > >> >>>> heavily
> > > > > > > > > > >> >>>>> on Hadoop ingest. We’d selfishly want support
> > forever,
> > > > > > but if
> > > > > > > > > > forced
> > > > > > > > > > >> >>> to
> > > > > > > > > > >> >>>>> pivot to a new ingestion style for our larger
> > batch
> > > > > > ingest jobs
> > > > > > > > > > that
> > > > > > > > > > >> >>>>> currently leverage the cheap compute on YARN,
> the
> > > > longer
> > > > > > the
> > > > > > > > > lead
> > > > > > > > > > >> time
> > > > > > > > > > >> >>>>> between announcement by the community to the
> > actual
> > > > > > release with
> > > > > > > > > > no
> > > > > > > > > > >> >>>>> support, the better. Making these types of
> > changes can
> > > > > be
> > > > > > a slow
> > > > > > > > > > >> >>> process
> > > > > > > > > > >> >>>>> for the slow to maneuver corporate cruise ship.
> > > > > > > > > > >> >>>>>
> > > > > > > > > > >> >>>>> On Thu, Dec 12, 2024 at 9:46 AM Lucas
> Capistrant <
> > > > > > > > > > >> >>>>> capistrant.lu...@gmail.com>
> > > > > > > > > > >> >>>>> wrote:
> > > > > > > > > > >> >>>>>
> > > > > > > > > > >> >>>>>>
> > > > > > > > > > >> >>>>>>
> > > > > > > > > > >> >>>>>> On Wed, Dec 11, 2024 at 9:10 PM Karan Kumar <
> > > > > > ka...@apache.org>
> > > > > > > > > > >> >>> wrote:
> > > > > > > > > > >> >>>>>>
> > > > > > > > > > >> >>>>>>> +1 for removal of Hadoop based ingestion.
> It's a
> > > > > > maintenance
> > > > > > > > > > >> >>> overhead
> > > > > > > > > > >> >>>>> and
> > > > > > > > > > >> >>>>>>> stops us from moving to java 17.
> > > > > > > > > > >> >>>>>>> I am not aware of any gaps in sql based
> > ingestion
> > > > > which
> > > > > > limits
> > > > > > > > > > >> >>> users
> > > > > > > > > > >> >>>> to
> > > > > > > > > > >> >>>>>>> move off from hadoop. If there are any, please
> > feel
> > > > > > free to
> > > > > > > > > > reach
> > > > > > > > > > >> >>> out
> > > > > > > > > > >> >>>>> via
> > > > > > > > > > >> >>>>>>> slack/github.
> > > > > > > > > > >> >>>>>>>
> > > > > > > > > > >> >>>>>>> On Thu, Dec 12, 2024 at 3:22 AM Clint Wylie <
> > > > > > > > > cwy...@apache.org>
> > > > > > > > > > >> >>>> wrote:
> > > > > > > > > > >> >>>>>>>
> > > > > > > > > > >> >>>>>>>> Hey everyone,
> > > > > > > > > > >> >>>>>>>>
> > > > > > > > > > >> >>>>>>>> It is about that time again to take a pulse
> on
> > how
> > > > > > commonly
> > > > > > > > > > >> >>> Hadoop
> > > > > > > > > > >> >>>>>>>> based ingestion is used with Druid in order
> to
> > > > > > determine if
> > > > > > > > > we
> > > > > > > > > > >> >>>> should
> > > > > > > > > > >> >>>>>>>> keep supporting it or not going forward.
> > > > > > > > > > >> >>>>>>>>
> > > > > > > > > > >> >>>>>>>> In my view, Hadoop based ingestion has
> > unofficially
> > > > > > been on
> > > > > > > > > > life
> > > > > > > > > > >> >>>>>>>> support for quite some time as we do not
> > really go
> > > > > out
> > > > > > of our
> > > > > > > > > > >> >>> way to
> > > > > > > > > > >> >>>>>>>> add new features to it, and we perform very
> > minimal
> > > > > > testing
> > > > > > > > > to
> > > > > > > > > > >> >>>> ensure
> > > > > > > > > > >> >>>>>>>> everything keeps working. The most recent
> > changes
> > > > to
> > > > > > it I am
> > > > > > > > > > >> >>> aware
> > > > > > > > > > >> >>>> of
> > > > > > > > > > >> >>>>>>>> was to bump versions and require Hadoop 3,
> but
> > that
> > > > > was
> > > > > > > > > > primarily
> > > > > > > > > > >> >>>>>>>> motivated by selfish reasons of wanting to
> use
> > its
> > > > > > contained
> > > > > > > > > > >> >>> client
> > > > > > > > > > >> >>>>>>>> library and better isolation so that we could
> > free
> > > > up
> > > > > > our own
> > > > > > > > > > >> >>>>>>>> dependencies to be updated. This thread is
> > > > motivated
> > > > > > by a
> > > > > > > > > > similar
> > > > > > > > > > >> >>>>>>>> reason I guess, see the other thread I
> started
> > > > > recently
> > > > > > > > > > >> >>> discussing
> > > > > > > > > > >> >>>>>>>> dropping support for Java 11 where Hadoop
> does
> > not
> > > > > yet
> > > > > > > > > support
> > > > > > > > > > >> >>> Java
> > > > > > > > > > >> >>>> 17
> > > > > > > > > > >> >>>>>>>> runtime, and so the outcome of this
> discussion
> > is
> > > > > > involved in
> > > > > > > > > > >> >>> those
> > > > > > > > > > >> >>>>>>>> plans.
> > > > > > > > > > >> >>>>>>>>
> > > > > > > > > > >> >>>>>>>> I think SQL based ingestion with the
> > multi-stage
> > > > > query
> > > > > > engine
> > > > > > > > > > is
> > > > > > > > > > >> >>> the
> > > > > > > > > > >> >>>>>>>> future of batch ingestion, and the Kubernetes
> > based
> > > > > > task
> > > > > > > > > runner
> > > > > > > > > > >> >>>>>>>> provides an alternative for task auto scaling
> > > > > > capabilities.
> > > > > > > > > > >> >>> Because
> > > > > > > > > > >> >>>> of
> > > > > > > > > > >> >>>>>>>> this, I don't personally see a lot of
> > compelling
> > > > > > reasons to
> > > > > > > > > > keep
> > > > > > > > > > >> >>>>>>>> supporting Hadoop, so I would be in favor of
> > just
> > > > > > dropping
> > > > > > > > > > >> >>> support
> > > > > > > > > > >> >>>> for
> > > > > > > > > > >> >>>>>>>> it completely, though I see no harm in
> keeping
> > HDFS
> > > > > > deep
> > > > > > > > > > storage
> > > > > > > > > > >> >>>>>>>> around. In past discussions I think we had
> tied
> > > > > Hadoop
> > > > > > > > > removal
> > > > > > > > > > to
> > > > > > > > > > >> >>>>>>>> adding something like Spark to replace it,
> but
> > I
> > > > > > wonder if
> > > > > > > > > this
> > > > > > > > > > >> >>>> still
> > > > > > > > > > >> >>>>>>>> needs to be the case.
> > > > > > > > > > >> >>>>>>>>
> > > > > > > > > > >> >>>>>>>> I do know that classically there have been
> > quite a
> > > > > lot
> > > > > > of
> > > > > > > > > large
> > > > > > > > > > >> >>>> Druid
> > > > > > > > > > >> >>>>>>>> clusters in the wild still relying on Hadoop
> in
> > > > > > previous dev
> > > > > > > > > > list
> > > > > > > > > > >> >>>>>>>> discussions about this topic, so I wanted to
> > check
> > > > to
> > > > > > see if
> > > > > > > > > > >> >>> this is
> > > > > > > > > > >> >>>>>>>> still true and if so if any of these clusters
> > have
> > > > > > plans to
> > > > > > > > > > >> >>>> transition
> > > > > > > > > > >> >>>>>>>> to newer ways of ingesting data like SQL
> based
> > > > > > ingestion.
> > > > > > > > > While
> > > > > > > > > > >> >>>> from a
> > > > > > > > > > >> >>>>>>>> dev/maintenance perspective it would be best
> to
> > > > just
> > > > > > drop it
> > > > > > > > > > >> >>>>>>>> completely, if there is still a large user
> > base I
> > > > > > think we
> > > > > > > > > need
> > > > > > > > > > >> >>> to
> > > > > > > > > > >> >>>> be
> > > > > > > > > > >> >>>>>>>> open to keeping it around for a while longer.
> > If we
> > > > > do
> > > > > > need
> > > > > > > > > to
> > > > > > > > > > >> >>> keep
> > > > > > > > > > >> >>>>>>>> it, maybe it would be worth it to invest some
> > time
> > > > in
> > > > > > moving
> > > > > > > > > it
> > > > > > > > > > >> >>>> into a
> > > > > > > > > > >> >>>>>>>> contrib extension so that it isn't bundled by
> > > > default
> > > > > > with
> > > > > > > > > > Druid
> > > > > > > > > > >> >>>>>>>> releases to discourage new adoption and more
> > > > > accurately
> > > > > > > > > reflect
> > > > > > > > > > >> >>> its
> > > > > > > > > > >> >>>>>>>> current status in Druid.
> > > > > > > > > > >> >>>>>>>>
> > > > > > > > > > >> >>>>>>>>
> > > > > > > > > > >> >>>>
> > > > > > > > > >
> > > > > >
> > ---------------------------------------------------------------------
> > > > > > > > > > >> >>>>>>>> To unsubscribe, e-mail:
> > > > > > dev-unsubscr...@druid.apache.org
> > > > > > > > > > >> >>>>>>>> For additional commands, e-mail:
> > > > > > dev-h...@druid.apache.org
> > > > > > > > > > >> >>>>>>>>
> > > > > > > > > > >> >>>>>>>>
> > > > > > > > > > >> >>>>>>>
> > > > > > > > > > >> >>>>>>
> > > > > > > > > > >> >>>>>
> > > > > > > > > > >> >>>>
> > > > > > > > > > >> >>>
> > > > > > > > > > >> >>
> > > > > > > > > > >> >
> > > > > > > > > > >>
> > > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > > >
> > ---------------------------------------------------------------------
> > > > > > > To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
> > > > > > > For additional commands, e-mail: dev-h...@druid.apache.org
> > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > ---------------------------------------------------------------------
> > > > > > To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
> > > > > > For additional commands, e-mail: dev-h...@druid.apache.org
> > > > > >
> > > > > >
> > > > >
> > > > > --
> > > > >
> > > > > Best regards,
> > > > > Eyal Yurman
> > > > >
> > > >
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
> > For additional commands, e-mail: dev-h...@druid.apache.org
> >
> >
>
> --
>
> Best regards,
> Eyal Yurman
>

Re: [DISCUSS] Hadoop ingestion support

Reply via email to