0.10.1.0 - commitSync() doesn't contribute to "aliveness" of a consumer?

Jaikiran Pai Tue, 01 Nov 2016 07:10:26 -0700

We are using Kafka 0.10.1.0 (server) and Java client API (the new API)for consumers. One of the issues we have been running into is that theconsumer is considered "dead" by the co-ordinator because of the lack ofactivity within a specific period of time. In reality, the consumer isstill alive. We see exceptions like these:

org.apache.kafka.clients.consumer.CommitFailedException: Commit cannotbe completed since the group has already rebalanced and assigned thepartitions to another member. This means that the time betweensubsequent calls to poll() was longer than the configuredmax.poll.interval.ms, which typically implies that the poll loop isspending too much time message processing. You can address this eitherby increasing the session timeout or by reducing the maximum size ofbatches returned in poll() with max.poll.records.

I understand what that exception means and what we could potentially doto address that (setting a low value for max.poll.records is oneoption). Before changing the max.poll.records value in our setup, Iwould like to hear/understand a bit more about this so that I know thisis a right way to fix in the way we have implemented our consumers.Essentially, our consumer code is this:


            while (!stopped) {
                try {

final ConsumerRecords<K, V> consumerRecords =consumer.poll(someValue);for (final TopicPartition topicPartition :consumerRecords.partitions()) {

                        if (stopped) {
                            break;
                        }

for (final ConsumerRecord<K, V> consumerRecord: consumerRecords.records(topicPartition)) {final long previousOffset =consumerRecord.offset();// commit the offset and then pass on themessage for processing (in a separate thread)consumer.commitSync(Collections.singletonMap(topicPartition, newOffsetAndMetadata(previousOffset + 1)));


                            this.executor.execute(new Runnable() {
                                @Override
                                public void run() {
                                    // process the ConsumerRecord
                                }
                            });
                        }
                    }
                } catch (Exception e) {
                    // log the error and continue
                    continue;
                }
            }

As you can see the only thing that happens in the main thread which theconsumer is polling on is - commitSync for each record that was returnedin that batch of poll. I understand commitSync is blocking, sopotentially this can lead to each commitSync invocation adding up to thetime between each poll(). One option is using commitAsync, but we needto evaluate if it has other issues within our usecase.

But what I was wondering was, why doesn't commitSync contribute to thelogic of the consumer being alive? If it did, then I see no reason whythis consumer will ever be considered dead and that above messagelogged. Anyone see a problem with the code above?

P.S: We use the default session timeout value in the consumer configs(i.e. we don't set any specific value)



-Jaikiran

0.10.1.0 - commitSync() doesn't contribute to "aliveness" of a consumer?

Reply via email to