We have noticed that the Kafka offset auto-commit functionality seems to stop
working after it encounters a timeout. It appears in the logs like this:

2018-03-04 07:02:54,779 INFO 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator  - Marking
the coordinator kafka06:9092 (id: 2147483641 rack: null) dead for group
consumergroup01
2018-03-04 07:02:54,780 WARN 
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator  -
Auto-commit of offsets {topic01-24=OffsetAndMetadata{offset=153237895,
metadata=''}} failed for group consumergroup01: Offset commit failed with a
retriable exception. You should retry committing offsets. The underlying
error was: The request timed out.

After this message is logged, no more offsets are committed by the job until
it is restarted (and if the flink process ends abnormally, the offsets never
get committed).

This is using Flink 1.4.0 which uses kafka-clients 0.11.0.2. We are using
the default kafka client settings for enable.auto.commit (true) and
auto.commit.interval.ms (5000). We are not using Flink checkpointing, so the
kafka client offset commit mode is OffsetCommitMode.KAFKA_PERIODIC (not
OffsetCommitMode.ON_CHECKPOINTS).

I'm wondering if others have encountered this?

And if so, does enabling checkpointing resolve the issue, because
Kafka09Fetcher.doCommitInternalOffsetsToKafka is called from the Flink code?



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Reply via email to