Re: Rebalancing during the long-running tasks

Damian Guy Tue, 16 Feb 2016 08:38:02 -0800

Hi,

I had the same issue and managed to work around it by simulating a
heartbeat to kafka. It works really well, i.e., we have had zero issues
since it was implemented


I have somthing like this:


void process() {
   records = consumer.poll(timeout)
   dispatcher.dispatch(records)
   while(!dispatcher.isDone(heartbeatInterval, TimeUnit.MILLISECONDS) {
     heartbeat()
   }
}

void heartbeat() {
  consumer.pause(getCurrentAssignment())
  consumer.poll(10)
  consumer.resume(getCurrentAssignment())
}

TopicPartition[] getCurrentAssignment() {
   return consumer.assignment().toArray(new TopicPartition[0])
}


On 16 February 2016 at 16:20, Ben Stopford <[email protected]> wrote:

> I think you’ll find some useful context in this KIP Jason wrote. It’s
> pretty good.
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-41%3A+KafkaConsumer+Max+Records
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-41:+KafkaConsumer+Max+Records
> >
>
>
> > On 16 Feb 2016, at 07:15, Насыров Ренат <[email protected]> wrote:
> >
> > Hello!
> >
> > I'm trying to use kafka for long-running tasks processing. The tasks can
> be very short (less than a second) or very long (about 10 minutes). I've
> got one consumer group for the single queue, and one or more consumers.
> Sometimes consumers manage to commit their offsets before rebalancing,
> sometimes not (and fail). Accordning to this document (
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+0.9+Consumer+Rewrite+Design
> ), in worst case (when all the consumers are on very long tasks) it goes as
> follows:
> >
> > 1) Consumers get long tasks from the queue.
> > 2) Consumers performing their long-running tasks.
> > 3) Session timeout happens.
> > 4) Group coordinator performs a rebalance; the current generation number
> is increased.
> > 5) Consumers complete their long-running tasks and commit.
> > 6) GroupCoordinator returns IllegalGeneration errors to consumers and
> does not allow to commit the offsets.
> > 7) Consumers reconnect and get the very messages from the previous
> generation, thus stucking in forever loop.
> >
> > Suggestions:
> >
> > 1) Commit first, then process. Inacceptable in my case because it leads
> to at-most-once semantics.
> > 2) Increase session timeout limit. Not desired because task duration can
> negatively affect the effectiveness of rebalance.
> >
> > Is there any proper way to complete long-running tasks?
>
>

Re: Rebalancing during the long-running tasks

Reply via email to