Re: [DISCUSS] Hadoop ingestion support

Eyal Yurman Tue, 01 Jul 2025 20:07:15 -0700

Druid also includes druid-hdfs-storage core extension which bundles
hadoop-client-api.


I assume there isn't a plan to deprecate this extension?

On Tue, Jul 1, 2025 at 8:19 AM Gian Merlino <g...@apache.org> wrote:

> We are in a tough situation where our hand is being forced on dropping
> Java 11 support by the Jetty 9 EOL situation. It isn't a good idea to
> continue using Jetty 9 given it's no longer receiving security updates, and
> Jetty 12 (the only currently-supported version) requires Java 17.
>
> However, the main thing pushing us to drop Hadoop-based ingestion support
> is the fact that Hadoop doesn't support Java 17. If we can find a way to
> keep it working even with the Jetty 12 + Java 17 upgrade, then we don't
> necessarily need to drop the support immediately. To that end I suggest the
> following approach:
>
> 1) In Druid 34 (next upcoming release, code freeze in a week or so)
> announce that Java 11 support and Hadoop-based ingestion are both
> deprecated. For Hadoop-based ingestion, provide guidance on migrating to
> SQL-based ingestion (as a Map/Reduce replacement) and the k8s task runner
> (as a YARN replacement).
>
> 2) In Druid 35 (~October) bump up the minimum Java version to Java 17, and
> upgrade to Jetty 12. Try to keep Hadoop-based ingestion working by
> continuing to target Java 11 when we compile our own code, and avoiding
> usage of Java-17-requiring libraries on the ingestion code path. (I believe
> Jetty is not used on the ingestion code path.) We may be able to continue
> supporting Hadoop-based ingestion in this way. If we can- great. If we
> can't- we would need to remove support for Hadoop-based ingestion in this
> version.
>
> If we manage to get Druid 35 working with Hadoop-based ingestion, that
> situation could continue for some time. At some point, something else may
> force our hand- perhaps a critical library on the ingestion path will begin
> to require Java 17. We would see how it plays out.
>
> Thoughts?
>
> Gian
>
> On 2025/06/23 20:14:39 Lucas Capistrant wrote:
> > Thanks for your input from Roku user point of view, Krishna. We are
> > definitely in a tough spot here because of Hadoop support preventing us
> > from dropping Java 11 support. And then the domino effect being we can’t
> > upgrade off of EOL dependencies such as Jetty 9.
> >
> > In the Java 11 support discussion,
> > https://lists.apache.org/thread/bvkztwoyy35mvyqkccp87zrfd68sqqkw, we
> > discuss the risk of supporting Java 11 beyond Druid 34. I think the
> biggest
> > worry is that we are going to get caught in a situation where a patch fix
> > for a CVE could require dropping Java 11 and Hadoop support in a patch
> > release because resolving the CVE requires dependency upgrades that don’t
> > support 11. Delaying dropping support until Druid 36 makes it all the
> more
> > likely that we run into that situation.
> >
> > If we were to drop Hadoop ingest support in October as a part of Druid
> 35,
> > would there be a clear path forward for your Druid deployments? Assuming
> > the community provides a solid migration plan for open source users
> > regarding Hadoop ingestion alternatives.
> >
> > Also, if there is a path to supporting Hadoop ingestion as a contrib
> > extension and someone in the community wanted to carry the torch on its
> > development, that is definitely a possibility as well. Though, I’m not
> sure
> > that anyone has scoped out how much work that would be, or if it’s even
> > possible to achieve.
> >
> > Thanks,
> > Lucas
> >
> > On Wed, Jun 18, 2025 at 6:50 PM Krishna Thirumalasetty <
> kthir...@gmail.com>
> > wrote:
> >
> > > Hi everyone,
> > >
> > > Adding to the voices from Netflix and Target — at Roku Inc., we also
> rely
> > > heavily on Hadoop-based batch ingestion for a significant portion of
> our
> > > Druid datasources. This approach allows us to leverage our existing
> Hadoop
> > > infrastructure efficiently and cost-effectively for large-scale batch
> > > processing.
> > >
> > > If the community decides to move forward with the removal of Hadoop
> > > ingestion support, it would likely force us to remain on an older
> version
> > > of Druid for some time. This is not ideal, as it would prevent us from
> > > benefiting from ongoing improvements, security updates, and newer
> features
> > > in the Druid ecosystem.
> > >
> > > That said, we fully understand and support the broader goals of
> modernizing
> > > the Druid platform, reducing tech debt, and enabling the use of more
> > > current Java features and dependency upgrades. Given these competing
> > > priorities, we believe the best path forward would be:
> > >
> > >    - *Clear deprecation communication* in Druid 32, discouraging new
> > >    adoption while giving teams time to react.
> > >    - *An official target removal date*, such as in Druid 36 (early
> 2026),
> > >    which provides adequate lead time for organizations like ours to
> > > evaluate
> > >    alternatives and begin planning migrations.
> > >    - *Consideration of keeping the Hadoop ingestion module as a contrib
> > >    extension*, or at least providing a supported migration path with
> > >    documentation to MM-less ingestion or other batch ingestion
> > > alternatives.
> > >
> > > This approach would help companies like Roku manage the transition in a
> > > predictable and structured way, while also empowering the Druid
> community
> > > to move forward with more agility.
> > >
> > > Thanks for raising this important discussion.
> > >
> > > Best,
> > > Krishna Thirumalasetty
> > > Roku Inc.
> > >
> > > On Tue, Jun 17, 2025 at 3:28 PM Eyal Yurman <eyal.yur...@gmail.com>
> wrote:
> > >
> > > > Sharing as another data point -
> > > >
> > > > We still use YARN to run Hadoop-based batch ingestion. Very useful
> > > > on-premise for resource sharing, where autoscaling isn't always an
> > > option.
> > > > But we plan to move to Kubernetes for ingestion sometime next year.
> > > >
> > > >
> > > > On Tue, Jun 17, 2025 at 12:20 PM Gian Merlino <g...@apache.org>
> wrote:
> > > >
> > > > > I'm on board with this. I also think we should deprecate it ASAP,
> > > > starting
> > > > > in the next major release. It'd be nice to also build a migration
> guide
> > > > > that helps people move from Hadoop ingestion to SQL/MSQ ingestion,
> and
> > > > from
> > > > > YARN to K8S pod runners.
> > > > >
> > > > > Gian
> > > > >
> > > > > On 2025/06/09 20:10:03 Clint Wylie wrote:
> > > > > > Following up on this, I want to propose the first release of
> 2026 for
> > > > > > removal, which I think would be Druid 36, to give some lead time
> for
> > > > > > those affected to prepare.
> > > > > >
> > > > > > On Wed, Apr 9, 2025 at 8:42 AM Frank Chen <frankc...@apache.org>
> > > > wrote:
> > > > > > >
> > > > > > > We don't use Hadoop ingestion, it's OK for us to drop the
> support
> > > of
> > > > > Hadoop.
> > > > > > >
> > > > > > > We can make an announcement to deprecate it first(from 33?),
> remove
> > > > it
> > > > > from
> > > > > > > official distribution( but keep the ability to build it as
> above
> > > > > suggested,
> > > > > > > from 34?),
> > > > > > > and remove it completely at a proper time.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Wed, Apr 9, 2025 at 5:02 AM Maytas Monsereenusorn <
> > > > > mayt...@apache.org>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > I'm in favor of removing too but we should not rush the
> removal
> > > and
> > > > > make
> > > > > > > > sure we give enough time for users to migrate to other types
> of
> > > > > ingestion.
> > > > > > > > Similar to what Lucas said, if Hadoop is holding back Druid
> then
> > > we
> > > > > should
> > > > > > > > remove it. Druid also supports many other types of ingestion
> > > > > compared to
> > > > > > > > back when Hadoop ingestion was added.
> > > > > > > > For Netflix, we will be migrating to MM-less Druid ingestion
> in
> > > > K8s.
> > > > > I
> > > > > > > > think MM-less Druid ingestion in K8s is probably the closest
> to
> > > > > Hadoop
> > > > > > > > ingestion as we do not have to maintain a dedicated Druid
> > > specific
> > > > MM
> > > > > > > > cluster (works well for companies with existing large/shared
> > > > Compute
> > > > > > > > clusters). Personally, I feel we should focus our energy on
> > > things
> > > > > > > > like MM-less Druid in K8s (which is still marked as
> Experimental)
> > > > > rather
> > > > > > > > than Hadoop.
> > > > > > > >
> > > > > > > > Best Regards,
> > > > > > > > Maytas
> > > > > > > >
> > > > > > > > On Tue, Apr 8, 2025 at 4:06 AM Lucas Capistrant <
> > > > > > > > capistrant.lu...@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Yes, I’m in favor of removing it from the core release and
> also
> > > > in
> > > > > favor
> > > > > > > > of
> > > > > > > > > officially announcing deprecation with a timeline for
> removal,
> > > if
> > > > > we have
> > > > > > > > > not yet. It stinks to lose the Hadoop ingest support, but
> if
> > > that
> > > > > project
> > > > > > > > > is going to hold back Druid, it seems we don’t have much
> > > choice.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Lucas
> > > > > > > > >
> > > > > > > > > On Tue, Apr 8, 2025 at 4:27 AM Karan Kumar <
> ka...@apache.org>
> > > > > wrote:
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Like the plan of having a hadoop profile, not shipping
> it a
> > > > part
> > > > > of the
> > > > > > > > > > apache release and then we can eventually remove it in a
> > > > release
> > > > > or 2 .
> > > > > > > > > > Does that work for you folks Maytas, Lucas ?
> > > > > > > > > >
> > > > > > > > > > On Mon, Apr 7, 2025 at 3:59 PM Zoltan Haindrich <
> k...@rxd.hu
> > > >
> > > > > wrote:
> > > > > > > > > >
> > > > > > > > > >> Hey,
> > > > > > > > > >>
> > > > > > > > > >> I was also bumping into this while I was running
> > > > > dependency-checks for
> > > > > > > > > >> Druid-33
> > > > > > > > > >> * I've  encountered a CVE [1] in hadoop-runtime-3.3.6
> which
> > > > is a
> > > > > > > > shaded
> > > > > > > > > >> jar
> > > > > > > > > >> * we have a PR to upgrade to 3.4.0 ; so I checked also
> > > 3.4.1 -
> > > > > but
> > > > > > > > they
> > > > > > > > > >> are also affected as they ship with (jetty is
> > > > 9.4.53.v20231009)
> > > > > [2]
> > > > > > > > > >>
> > > > > > > > > >> ..so right now there is no normal way to solve this -
> the
> > > fact
> > > > > that
> > > > > > > > its
> > > > > > > > > a
> > > > > > > > > >> shaded jar further complicates things..
> > > > > > > > > >>
> > > > > > > > > >> Note: the trunk Hadoop uses jetty 9.4.57 [3] - which is
> > > good;
> > > > > so there
> > > > > > > > > >> will be some future version which might be not affected
> > > > > > > > > >> I wanted to be thorough and digged into a few things -
> to
> > > see
> > > > > how soon
> > > > > > > > > an
> > > > > > > > > >> updated version may come out:
> > > > > > > > > >> * there are a 300+ tickets targeted for 3.5.0 .. so that
> > > > > doesn't looks
> > > > > > > > > >> promising
> > > > > > > > > >> * but even for 3.4.2 there is a huge jira [4] with 159
> > > > subtasks
> > > > > out of
> > > > > > > > > >> which 123 is unassigned...
> > > > > > > > > >>    if that's really needed for 3.4.2 then I doubt
> they'll be
> > > > > rolling
> > > > > > > > out
> > > > > > > > > >> a release soon...
> > > > > > > > > >> * I was also peeking into jdk17 jiras which will most
> likely
> > > > > arrive in
> > > > > > > > > >> 3.5.0 [5]
> > > > > > > > > >>
> > > > > > > > > >> Keeping Hadoop like this will hold us back from:
> > > > > > > > > >> * upgrading 3rd party deps
> > > > > > > > > >> * forces us to add security supressions
> > > > > > > > > >> * slows down newer jdk adoption - as officially hadoop
> only
> > > > > supports
> > > > > > > > 11
> > > > > > > > > >>
> > > > > > > > > >> I think most of the companies using Hadoop are utilizing
> > > > > binaries
> > > > > > > > which
> > > > > > > > > >> are being built from forks - and they also have the
> > > > > ability&bandwidth
> > > > > > > > to
> > > > > > > > > >> fix these 3rd party
> > > > > > > > > >> libraries...
> > > > > > > > > >> I would also guess that they might be also using a
> custom
> > > > built
> > > > > Druid
> > > > > > > > -
> > > > > > > > > >> and as a result: they have more control over what kind
> of
> > > > > features
> > > > > > > > they
> > > > > > > > > >> have or not.
> > > > > > > > > >>
> > > > > > > > > >> So I was wondering about the following:
> > > > > > > > > >> * add a maven profile for hadoop support (defaults to
> off)
> > > > > > > > > >> * retain compaibility: during CI runs: build with jdk11
> and
> > > > run
> > > > > all
> > > > > > > > > >> hadoop tests
> > > > > > > > > >> * future releases (>=34) would ship w/o hadoop ingestion
> > > > > > > > > >> * companies using hadoop-ingestion could turn on the
> profile
> > > > > and use
> > > > > > > > it
> > > > > > > > > >>
> > > > > > > > > >> What do you guys think?
> > > > > > > > > >>
> > > > > > > > > >> cheers,
> > > > > > > > > >> Zoltan
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >> [1] https://nvd.nist.gov/vuln/detail/cve-2024-22201
> > > > > > > > > >> [2]
> > > > > > > > > >>
> > > > > > > > >
> > > > > > > >
> > > > >
> > > >
> > >
> https://github.com/apache/hadoop/blob/626b227094027ed08883af97a0734d2db7863864/hadoop-project/pom.xml#L40
> > > > > > > > > >> [3]
> > > > > > > > > >>
> > > > > > > > >
> > > > > > > >
> > > > >
> > > >
> > >
> https://github.com/apache/hadoop/blob/3d2f4d669edcf321509ceacde58a8160aef06a8c/hadoop-project/pom.xml#L40
> > > > > > > > > >> [4] https://issues.apache.org/jira/browse/HADOOP-19353
> > > > > > > > > >> [5] https://issues.apache.org/jira/browse/HADOOP-17177
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >> On 1/8/25 11:56, Abhishek Agarwal wrote:
> > > > > > > > > >> > @Adarsh - FYI since you are the release manager for
> 32.
> > > > > > > > > >> >
> > > > > > > > > >> > On Wed, Jan 8, 2025 at 11:53 AM Abhishek Agarwal <
> > > > > > > > abhis...@apache.org
> > > > > > > > > >
> > > > > > > > > >> > wrote:
> > > > > > > > > >> >
> > > > > > > > > >> >> I don't want to kick that can too far down the road
> > > either
> > > > > :) We
> > > > > > > > > don't
> > > > > > > > > >> >> want to give a false hope that it's going to remain
> > > around
> > > > > forever.
> > > > > > > > > >> But yes
> > > > > > > > > >> >> let's deprecate both Hadoop and Java 11 support in
> the
> > > > > upcoming 32
> > > > > > > > > >> release.
> > > > > > > > > >> >> It's unfortunate that Hadoop still doesn't support
> Java
> > > 17.
> > > > > We
> > > > > > > > > >> shouldn't
> > > > > > > > > >> >> let it hold us back. Jetty, pac4j are dropping Java
> 11
> > > > > support and
> > > > > > > > we
> > > > > > > > > >> would
> > > > > > > > > >> >> want to upgrade to newer versions of these
> dependencies
> > > > > soon. There
> > > > > > > > > are
> > > > > > > > > >> >> also nice language features in Java 17 such as
> pattern
> > > > > matching,
> > > > > > > > > >> multiline
> > > > > > > > > >> >> strings, and a lot more that we can't use if we have
> to
> > > be
> > > > > compile
> > > > > > > > > >> >> compatible with Java 11. If you need the resource
> > > > elasticity
> > > > > that
> > > > > > > > > >> Hadoop
> > > > > > > > > >> >> provides or want to reuse shared infrastructure in
> the
> > > > > company,
> > > > > > > > > MM-less
> > > > > > > > > >> >> ingestion is a good alternative.
> > > > > > > > > >> >>
> > > > > > > > > >> >> So let's deprecate it in 32. We can decide on removal
> > > later
> > > > > but
> > > > > > > > > >> hopefully,
> > > > > > > > > >> >> it doesn't take too many releases to do that.
> > > > > > > > > >> >>
> > > > > > > > > >> >> On Tue, Jan 7, 2025 at 4:22 PM Karan Kumar <
> > > > ka...@apache.org
> > > > > >
> > > > > > > > wrote:
> > > > > > > > > >> >>
> > > > > > > > > >> >>> Okay from what I can gather few folks still need
> hadoop
> > > > > ingestion.
> > > > > > > > > So
> > > > > > > > > >> >>> let's
> > > > > > > > > >> >>> kick the can down the road regarding removal of that
> > > > > support but
> > > > > > > > > let's
> > > > > > > > > >> >>> agree on the deprecation plan. Since druid 32 is
> around
> > > > the
> > > > > corner
> > > > > > > > > >> let's
> > > > > > > > > >> >>> atleast deprecated hadoop ingestion so that any new
> > > users
> > > > > are not
> > > > > > > > > >> >>> onboarded
> > > > > > > > > >> >>> to this way of ingestion. Deprecation also becomes a
> > > > forcing
> > > > > > > > > function
> > > > > > > > > >> in
> > > > > > > > > >> >>> internal company channel's for prioritization of
> getting
> > > > off
> > > > > > > > hadoop.
> > > > > > > > > >> >>>
> > > > > > > > > >> >>> How does this plan look?
> > > > > > > > > >> >>>
> > > > > > > > > >> >>> On Fri, Dec 13, 2024 at 1:11 AM Maytas
> Monsereenusorn <
> > > > > > > > > >> mayt...@apache.org
> > > > > > > > > >> >>>>
> > > > > > > > > >> >>> wrote:
> > > > > > > > > >> >>>
> > > > > > > > > >> >>>> We at Netflix are in a similar situation to Target
> > > > > Corporation
> > > > > > > > > >> (Lucas C
> > > > > > > > > >> >>>> email above).
> > > > > > > > > >> >>>> We currently rely on Hadoop ingestion for all our
> batch
> > > > > ingestion
> > > > > > > > > >> jobs.
> > > > > > > > > >> >>> The
> > > > > > > > > >> >>>> main reason for this is that we already have a
> large
> > > > Hadoop
> > > > > > > > cluster
> > > > > > > > > >> >>>> supporting our Spark workloads that we can
> leverage for
> > > > > Druid
> > > > > > > > > >> >>> ingestion. I
> > > > > > > > > >> >>>> imagine that the closest alternative for us would
> be
> > > > > moving to
> > > > > > > > K8 /
> > > > > > > > > >> >>>> MiddleManager-less ingestion job.
> > > > > > > > > >> >>>>
> > > > > > > > > >> >>>> On Thu, Dec 12, 2024 at 10:56 PM Lucas Capistrant <
> > > > > > > > > >> >>>> capistrant.lu...@gmail.com> wrote:
> > > > > > > > > >> >>>>
> > > > > > > > > >> >>>>> Apologies for the empty email… fat fingers.
> > > > > > > > > >> >>>>>
> > > > > > > > > >> >>>>> Just wanted to say that we at Target Corporation
> > > (USA),
> > > > > still
> > > > > > > > rely
> > > > > > > > > >> >>>> heavily
> > > > > > > > > >> >>>>> on Hadoop ingest. We’d selfishly want support
> forever,
> > > > > but if
> > > > > > > > > forced
> > > > > > > > > >> >>> to
> > > > > > > > > >> >>>>> pivot to a new ingestion style for our larger
> batch
> > > > > ingest jobs
> > > > > > > > > that
> > > > > > > > > >> >>>>> currently leverage the cheap compute on YARN, the
> > > longer
> > > > > the
> > > > > > > > lead
> > > > > > > > > >> time
> > > > > > > > > >> >>>>> between announcement by the community to the
> actual
> > > > > release with
> > > > > > > > > no
> > > > > > > > > >> >>>>> support, the better. Making these types of
> changes can
> > > > be
> > > > > a slow
> > > > > > > > > >> >>> process
> > > > > > > > > >> >>>>> for the slow to maneuver corporate cruise ship.
> > > > > > > > > >> >>>>>
> > > > > > > > > >> >>>>> On Thu, Dec 12, 2024 at 9:46 AM Lucas Capistrant <
> > > > > > > > > >> >>>>> capistrant.lu...@gmail.com>
> > > > > > > > > >> >>>>> wrote:
> > > > > > > > > >> >>>>>
> > > > > > > > > >> >>>>>>
> > > > > > > > > >> >>>>>>
> > > > > > > > > >> >>>>>> On Wed, Dec 11, 2024 at 9:10 PM Karan Kumar <
> > > > > ka...@apache.org>
> > > > > > > > > >> >>> wrote:
> > > > > > > > > >> >>>>>>
> > > > > > > > > >> >>>>>>> +1 for removal of Hadoop based ingestion. It's a
> > > > > maintenance
> > > > > > > > > >> >>> overhead
> > > > > > > > > >> >>>>> and
> > > > > > > > > >> >>>>>>> stops us from moving to java 17.
> > > > > > > > > >> >>>>>>> I am not aware of any gaps in sql based
> ingestion
> > > > which
> > > > > limits
> > > > > > > > > >> >>> users
> > > > > > > > > >> >>>> to
> > > > > > > > > >> >>>>>>> move off from hadoop. If there are any, please
> feel
> > > > > free to
> > > > > > > > > reach
> > > > > > > > > >> >>> out
> > > > > > > > > >> >>>>> via
> > > > > > > > > >> >>>>>>> slack/github.
> > > > > > > > > >> >>>>>>>
> > > > > > > > > >> >>>>>>> On Thu, Dec 12, 2024 at 3:22 AM Clint Wylie <
> > > > > > > > cwy...@apache.org>
> > > > > > > > > >> >>>> wrote:
> > > > > > > > > >> >>>>>>>
> > > > > > > > > >> >>>>>>>> Hey everyone,
> > > > > > > > > >> >>>>>>>>
> > > > > > > > > >> >>>>>>>> It is about that time again to take a pulse on
> how
> > > > > commonly
> > > > > > > > > >> >>> Hadoop
> > > > > > > > > >> >>>>>>>> based ingestion is used with Druid in order to
> > > > > determine if
> > > > > > > > we
> > > > > > > > > >> >>>> should
> > > > > > > > > >> >>>>>>>> keep supporting it or not going forward.
> > > > > > > > > >> >>>>>>>>
> > > > > > > > > >> >>>>>>>> In my view, Hadoop based ingestion has
> unofficially
> > > > > been on
> > > > > > > > > life
> > > > > > > > > >> >>>>>>>> support for quite some time as we do not
> really go
> > > > out
> > > > > of our
> > > > > > > > > >> >>> way to
> > > > > > > > > >> >>>>>>>> add new features to it, and we perform very
> minimal
> > > > > testing
> > > > > > > > to
> > > > > > > > > >> >>>> ensure
> > > > > > > > > >> >>>>>>>> everything keeps working. The most recent
> changes
> > > to
> > > > > it I am
> > > > > > > > > >> >>> aware
> > > > > > > > > >> >>>> of
> > > > > > > > > >> >>>>>>>> was to bump versions and require Hadoop 3, but
> that
> > > > was
> > > > > > > > > primarily
> > > > > > > > > >> >>>>>>>> motivated by selfish reasons of wanting to use
> its
> > > > > contained
> > > > > > > > > >> >>> client
> > > > > > > > > >> >>>>>>>> library and better isolation so that we could
> free
> > > up
> > > > > our own
> > > > > > > > > >> >>>>>>>> dependencies to be updated. This thread is
> > > motivated
> > > > > by a
> > > > > > > > > similar
> > > > > > > > > >> >>>>>>>> reason I guess, see the other thread I started
> > > > recently
> > > > > > > > > >> >>> discussing
> > > > > > > > > >> >>>>>>>> dropping support for Java 11 where Hadoop does
> not
> > > > yet
> > > > > > > > support
> > > > > > > > > >> >>> Java
> > > > > > > > > >> >>>> 17
> > > > > > > > > >> >>>>>>>> runtime, and so the outcome of this discussion
> is
> > > > > involved in
> > > > > > > > > >> >>> those
> > > > > > > > > >> >>>>>>>> plans.
> > > > > > > > > >> >>>>>>>>
> > > > > > > > > >> >>>>>>>> I think SQL based ingestion with the
> multi-stage
> > > > query
> > > > > engine
> > > > > > > > > is
> > > > > > > > > >> >>> the
> > > > > > > > > >> >>>>>>>> future of batch ingestion, and the Kubernetes
> based
> > > > > task
> > > > > > > > runner
> > > > > > > > > >> >>>>>>>> provides an alternative for task auto scaling
> > > > > capabilities.
> > > > > > > > > >> >>> Because
> > > > > > > > > >> >>>> of
> > > > > > > > > >> >>>>>>>> this, I don't personally see a lot of
> compelling
> > > > > reasons to
> > > > > > > > > keep
> > > > > > > > > >> >>>>>>>> supporting Hadoop, so I would be in favor of
> just
> > > > > dropping
> > > > > > > > > >> >>> support
> > > > > > > > > >> >>>> for
> > > > > > > > > >> >>>>>>>> it completely, though I see no harm in keeping
> HDFS
> > > > > deep
> > > > > > > > > storage
> > > > > > > > > >> >>>>>>>> around. In past discussions I think we had tied
> > > > Hadoop
> > > > > > > > removal
> > > > > > > > > to
> > > > > > > > > >> >>>>>>>> adding something like Spark to replace it, but
> I
> > > > > wonder if
> > > > > > > > this
> > > > > > > > > >> >>>> still
> > > > > > > > > >> >>>>>>>> needs to be the case.
> > > > > > > > > >> >>>>>>>>
> > > > > > > > > >> >>>>>>>> I do know that classically there have been
> quite a
> > > > lot
> > > > > of
> > > > > > > > large
> > > > > > > > > >> >>>> Druid
> > > > > > > > > >> >>>>>>>> clusters in the wild still relying on Hadoop in
> > > > > previous dev
> > > > > > > > > list
> > > > > > > > > >> >>>>>>>> discussions about this topic, so I wanted to
> check
> > > to
> > > > > see if
> > > > > > > > > >> >>> this is
> > > > > > > > > >> >>>>>>>> still true and if so if any of these clusters
> have
> > > > > plans to
> > > > > > > > > >> >>>> transition
> > > > > > > > > >> >>>>>>>> to newer ways of ingesting data like SQL based
> > > > > ingestion.
> > > > > > > > While
> > > > > > > > > >> >>>> from a
> > > > > > > > > >> >>>>>>>> dev/maintenance perspective it would be best to
> > > just
> > > > > drop it
> > > > > > > > > >> >>>>>>>> completely, if there is still a large user
> base I
> > > > > think we
> > > > > > > > need
> > > > > > > > > >> >>> to
> > > > > > > > > >> >>>> be
> > > > > > > > > >> >>>>>>>> open to keeping it around for a while longer.
> If we
> > > > do
> > > > > need
> > > > > > > > to
> > > > > > > > > >> >>> keep
> > > > > > > > > >> >>>>>>>> it, maybe it would be worth it to invest some
> time
> > > in
> > > > > moving
> > > > > > > > it
> > > > > > > > > >> >>>> into a
> > > > > > > > > >> >>>>>>>> contrib extension so that it isn't bundled by
> > > default
> > > > > with
> > > > > > > > > Druid
> > > > > > > > > >> >>>>>>>> releases to discourage new adoption and more
> > > > accurately
> > > > > > > > reflect
> > > > > > > > > >> >>> its
> > > > > > > > > >> >>>>>>>> current status in Druid.
> > > > > > > > > >> >>>>>>>>
> > > > > > > > > >> >>>>>>>>
> > > > > > > > > >> >>>>
> > > > > > > > >
> > > > >
> ---------------------------------------------------------------------
> > > > > > > > > >> >>>>>>>> To unsubscribe, e-mail:
> > > > > dev-unsubscr...@druid.apache.org
> > > > > > > > > >> >>>>>>>> For additional commands, e-mail:
> > > > > dev-h...@druid.apache.org
> > > > > > > > > >> >>>>>>>>
> > > > > > > > > >> >>>>>>>>
> > > > > > > > > >> >>>>>>>
> > > > > > > > > >> >>>>>>
> > > > > > > > > >> >>>>>
> > > > > > > > > >> >>>>
> > > > > > > > > >> >>>
> > > > > > > > > >> >>
> > > > > > > > > >> >
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > > >
> ---------------------------------------------------------------------
> > > > > > To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
> > > > > > For additional commands, e-mail: dev-h...@druid.apache.org
> > > > > >
> > > > > >
> > > > >
> > > > >
> ---------------------------------------------------------------------
> > > > > To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
> > > > > For additional commands, e-mail: dev-h...@druid.apache.org
> > > > >
> > > > >
> > > >
> > > > --
> > > >
> > > > Best regards,
> > > > Eyal Yurman
> > > >
> > >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
> For additional commands, e-mail: dev-h...@druid.apache.org
>
>

-- 

Best regards,
Eyal Yurman

Re: [DISCUSS] Hadoop ingestion support

Reply via email to