We at Netflix are in a similar situation to Target Corporation (Lucas C email above). We currently rely on Hadoop ingestion for all our batch ingestion jobs. The main reason for this is that we already have a large Hadoop cluster supporting our Spark workloads that we can leverage for Druid ingestion. I imagine that the closest alternative for us would be moving to K8 / MiddleManager-less ingestion job.
On Thu, Dec 12, 2024 at 10:56 PM Lucas Capistrant < capistrant.lu...@gmail.com> wrote: > Apologies for the empty email… fat fingers. > > Just wanted to say that we at Target Corporation (USA), still rely heavily > on Hadoop ingest. We’d selfishly want support forever, but if forced to > pivot to a new ingestion style for our larger batch ingest jobs that > currently leverage the cheap compute on YARN, the longer the lead time > between announcement by the community to the actual release with no > support, the better. Making these types of changes can be a slow process > for the slow to maneuver corporate cruise ship. > > On Thu, Dec 12, 2024 at 9:46 AM Lucas Capistrant < > capistrant.lu...@gmail.com> > wrote: > > > > > > > On Wed, Dec 11, 2024 at 9:10 PM Karan Kumar <ka...@apache.org> wrote: > > > >> +1 for removal of Hadoop based ingestion. It's a maintenance overhead > and > >> stops us from moving to java 17. > >> I am not aware of any gaps in sql based ingestion which limits users to > >> move off from hadoop. If there are any, please feel free to reach out > via > >> slack/github. > >> > >> On Thu, Dec 12, 2024 at 3:22 AM Clint Wylie <cwy...@apache.org> wrote: > >> > >> > Hey everyone, > >> > > >> > It is about that time again to take a pulse on how commonly Hadoop > >> > based ingestion is used with Druid in order to determine if we should > >> > keep supporting it or not going forward. > >> > > >> > In my view, Hadoop based ingestion has unofficially been on life > >> > support for quite some time as we do not really go out of our way to > >> > add new features to it, and we perform very minimal testing to ensure > >> > everything keeps working. The most recent changes to it I am aware of > >> > was to bump versions and require Hadoop 3, but that was primarily > >> > motivated by selfish reasons of wanting to use its contained client > >> > library and better isolation so that we could free up our own > >> > dependencies to be updated. This thread is motivated by a similar > >> > reason I guess, see the other thread I started recently discussing > >> > dropping support for Java 11 where Hadoop does not yet support Java 17 > >> > runtime, and so the outcome of this discussion is involved in those > >> > plans. > >> > > >> > I think SQL based ingestion with the multi-stage query engine is the > >> > future of batch ingestion, and the Kubernetes based task runner > >> > provides an alternative for task auto scaling capabilities. Because of > >> > this, I don't personally see a lot of compelling reasons to keep > >> > supporting Hadoop, so I would be in favor of just dropping support for > >> > it completely, though I see no harm in keeping HDFS deep storage > >> > around. In past discussions I think we had tied Hadoop removal to > >> > adding something like Spark to replace it, but I wonder if this still > >> > needs to be the case. > >> > > >> > I do know that classically there have been quite a lot of large Druid > >> > clusters in the wild still relying on Hadoop in previous dev list > >> > discussions about this topic, so I wanted to check to see if this is > >> > still true and if so if any of these clusters have plans to transition > >> > to newer ways of ingesting data like SQL based ingestion. While from a > >> > dev/maintenance perspective it would be best to just drop it > >> > completely, if there is still a large user base I think we need to be > >> > open to keeping it around for a while longer. If we do need to keep > >> > it, maybe it would be worth it to invest some time in moving it into a > >> > contrib extension so that it isn't bundled by default with Druid > >> > releases to discourage new adoption and more accurately reflect its > >> > current status in Druid. > >> > > >> > --------------------------------------------------------------------- > >> > To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org > >> > For additional commands, e-mail: dev-h...@druid.apache.org > >> > > >> > > >> > > >