Re: Kafka long running job consumer config best practices and what to do to avoid stuck consumer

Chris Toomey Fri, 08 May 2020 11:26:08 -0700

I interpreted your post as saying "when our consumer gets stuck, Kafka's
automatic partition reassignment kicks in and that's problematic for us."
Hence I suggested not using the automatic partition assignment, which per
my interpretation would address your issue.


Chris

On Fri, May 8, 2020 at 2:19 AM Ali Nazemian <alinazem...@gmail.com> wrote:

> Thanks, Chris. So what is causing the consumer to get stuck is a side
> effect of the built-in partition assignment in Kafka and by overriding that
> behaviour I should be able to address the long-running job issue, is that
> right? Can you please elaborate more on this?
>
> Regards,
> Ali
>
> On Fri, May 8, 2020 at 1:09 PM Chris Toomey <ctoo...@gmail.com> wrote:
>
> > You really have to decide what behavior it is you want when one of your
> > consumers gets "stuck". If you don't like the way the group protocol
> > dynamically manages topic partition assignments or can't figure out an
> > appropriate set of configuration settings that achieve your goal, you can
> > always elect to not use the group protocol and instead manage topic
> > partition assignment yourself. As I just replied to another post,
> there's a
> > nice writeup of this under  "Manual Partition Assignment" in
> >
> >
> https://kafka.apache.org/24/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html
> >  .
> >
> > Chris
> >
> >
> > On Thu, May 7, 2020 at 12:37 AM Ali Nazemian <alinazem...@gmail.com>
> > wrote:
> >
> > > To help understanding my case in more details, the error I can see
> > > constantly is the consumer losing heartbeat and hence apparently the
> > group
> > > get rebalanced based on the log I can see from Kafka side:
> > >
> > > GroupCoordinator 11]: Member
> > > consumer-3-f46e14b4-5998-4083-b7ec-bed4e3f374eb in group foo has
> failed,
> > > removing it from the group
> > >
> > > Thanks,
> > > Ali
> > >
> > > On Thu, May 7, 2020 at 2:38 PM Ali Nazemian <alinazem...@gmail.com>
> > wrote:
> > >
> > > > Hi,
> > > >
> > > > With the emerge of using Apache Kafka for event-driven architecture,
> > one
> > > > thing that has become important is how to tune apache Kafka consumer
> to
> > > > manage long-running jobs. The main issue raises when we set a
> > relatively
> > > > large value for "max.poll.interval.ms". Setting this value will, of
> > > > course, resolve the issue of repetitive rebalance, but creates
> another
> > > > operational issue. I am looking for some sort of golden strategy to
> > deal
> > > > with long-running jobs with Apache Kafka.
> > > >
> > > > If the consumer hangs for whatever reason, there is no easy way of
> > > passing
> > > > that stage. It can easily block the pipeline, and you cannot do much
> > > about
> > > > it. Therefore, it came to my mind that I am probably missing
> something
> > > > here. What are the expectations? Is it not valid to use Apache Kafka
> > for
> > > > long-live jobs? Are there any other parameters need to be set, and
> the
> > > > issue of a consumer being stuck is caused by misconfiguration?
> > > >
> > > > I can see there are a lot of the same issues have been raised
> regarding
> > > > "the consumer is stuck" and usually, the answer has been "yeah,
> that's
> > > > because you have a long-running job, etc.". I have seen different
> > > > suggestions:
> > > >
> > > > - Avoid using long-running jobs. Read the message, submit it into
> > another
> > > > thread and let the consumer to pass. Obviously this can cause data
> loss
> > > and
> > > > it would be a difficult problem to handle. It might be better to
> avoid
> > > > using Kafka in the first place for these types of requests.
> > > >
> > > > - Avoid using apache Kafka for long-running requests
> > > >
> > > > - Workaround based approaches like if the consumer is blocked, try to
> > use
> > > > another consumer group and set the offset to the current value for
> the
> > > new
> > > > consumer group, etc.
> > > >
> > > > There might be other suggestions I have missed here, but that is not
> > the
> > > > point of this email. What I am looking for is what is the best
> practice
> > > for
> > > > dealing with long-running jobs with Apache Kafka. I cannot easily
> avoid
> > > > using Kafka because it plays a critical part in our application and
> > data
> > > > pipeline. On the other side, we have had so many challenges to keep
> the
> > > > long-running jobs stable operationally. So I would appreciate it if
> > > someone
> > > > can help me to understand what approach can be taken to deal with
> these
> > > > jobs with Apache Kafka as a message broker.
> > > >
> > > > Thanks,
> > > > Ali
> > > >
> > >
> > >
> > > --
> > > A.Nazemian
> > >
> >
>
>
> --
> A.Nazemian
>

Re: Kafka long running job consumer config best practices and what to do to avoid stuck consumer

Reply via email to