Strange partition offset behaviour - rewind after few minutes from 3605813651 to 3560783759

Seva Feldman Thu, 18 Apr 2019 05:58:36 -0700

Hi All,

We are using Apache Kafka 2.1/JVM 1.8_163 on top of Ubuntu 18 LTS running
on AWS r5a.2xl. Log directory resides on LVM volume based on 6 gp2 EBS
volumes.


After upgrading from 1.1 to 2.1 we started to suffer from strange behavior
on consumers: Consumer fails while the last offset is younger than the
previous one. We started to look at logs and found the following log
entries located on the same broker in server.log:

*[2019-04-18 07:03:41,016] INFO [Log partition=adunit_events-103,
dir=/var/lib/kafka/data] Rolled new log segment at offset 3605813651 in 1
ms. (kafka.log.Log)*
*...*
*[2019-04-18 07:09:32,580] INFO [Log partition=adunit_events-103,
dir=/var/lib/kafka/data] Incrementing log start offset to 3560783759
(kafka.log.Log) *

As you can see 6 minutes after a new log segment rolled out offset went
back from 3605813651 to 3560783759. This happened during cluster rolling
restart. However, it didn't happen right after broker restart.

*JVM CONFIG:*
*java -Xms12g -Xmx12g -XX:+ExplicitGCInvokesConcurrent
-XX:MetaspaceSize=96m -XX:+UseG1GC -XX:MaxGCPauseMillis=20
-XX:InitiatingHeapOccupancyPercent=35 -XX:G1HeapRegionSize=16M
-XX:MinMetaspaceFreeRatio=50 -XX:MaxMetaspaceFreeRatio=80 -XX:NewSize=9G
-XX:MaxNewSize=9G -XX:InitiatingHeapOccupancyPercent=3
-XX:G1MixedGCCountTarget=1 -XX:G1HeapWastePercent=1 -XX:+PrintGCDateStamps
-XX:+PrintGCTimeStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=3
-XX:GCLogFileSize=100M*

*Server config:*
*log.dirs=/var/lib/kafka/datanum.io.threads=3num.network.threads=32socket.send.buffer.bytes=33554432socket.receive.buffer.bytes=33554432socket.request.max.bytes=104857600num.partitions=2log.retention.hours=24log.segment.bytes=536870912log.retention.check.interval.ms
<http://log.retention.check.interval.ms>=60000zookeeper.connection.timeout.ms
<http://zookeeper.connection.timeout.ms>=1000000controlled.shutdown.enable=trueauto.leader.rebalance.enable=truelog.cleaner.enable=truelog.cleaner.min.cleanable.ratio=0.1log.cleaner.threads=1log.cleanup.policy=deletelog.cleaner.delete.retention.ms
<http://log.cleaner.delete.retention.ms>=86400000log.cleaner.io.max.bytes.per.second=1.7976931348623157E308log.message.format.version=2.1inter.broker.protocol.version=2.1num.recovery.threads.per.data.dir=1log.flush.interval.messages=9223372036854775807message.max.bytes=10000000replica.fetch.max.bytes=10000000default.replication.factor=2delete.topic.enable=trueunclean.leader.election.enable=falsecompression.type=snappy*

Any ideas?

Thanks in advance!

Seva Feldman

Strange partition offset behaviour - rewind after few minutes from 3605813651 to 3560783759

Reply via email to