Tomislav Rajakovic created KAFKA-8764: -----------------------------------------
Summary: LogCleanerManager endless loop while compacting/cleaning segments Key: KAFKA-8764 URL: https://issues.apache.org/jira/browse/KAFKA-8764 Project: Kafka Issue Type: Bug Components: log cleaner Affects Versions: 2.2.1 Environment: docker base image: openjdk:8-jre-alpine base image, kafka from http://ftp.carnet.hr/misc/apache/kafka/2.2.1/kafka_2.12-2.2.1.tgz Reporter: Tomislav Rajakovic Attachments: log-cleaner-bug-reproduction.zip {{LogCleanerManager stuck in endless loop while clearing segments for one partition resulting with many log outputs and heavy disk read/writes/IOPS.}} Issue appeared on follower brokers, and it happens on every (new) broker if partition assignment is changed. Original issue setup: * kafka_2.12-2.2.1 deployed as statefulset on kubernetes, 5 brokers * log directory is (AWS) EBS mounted PV, gp2 (ssd) kind of 750GiB * 5 zookeepers * topic created with config: ** name = "backup_br_domain_squad" partitions = 36 replication_factor = 3 config = { "cleanup.policy" = "compact" "min.compaction.lag.ms" = "86400000" "min.cleanable.dirty.ratio" = "0.3" } Log excerpt: {{[2019-08-07 12:10:53,895] INFO [Log partition=backup_br_domain_squad-14, dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} {{[2019-08-07 12:10:53,895] INFO Deleted log /var/lib/kafka/data/topics/backup_br_domain_squad-14/00000000000000000000.log.deleted. (kafka.log.LogSegment)}} {{[2019-08-07 12:10:53,896] INFO Deleted offset index /var/lib/kafka/data/topics/backup_br_domain_squad-14/00000000000000000000.index.deleted. (kafka.log.LogSegment)}} {{[2019-08-07 12:10:53,896] INFO Deleted time index /var/lib/kafka/data/topics/backup_br_domain_squad-14/00000000000000000000.timeindex.deleted. (kafka.log.LogSegment)}} {{[2019-08-07 12:10:53,964] INFO [Log partition=backup_br_domain_squad-14, dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} {{[2019-08-07 12:10:53,964] INFO Deleted log /var/lib/kafka/data/topics/backup_br_domain_squad-14/00000000000000000000.log.deleted. (kafka.log.LogSegment)}} {{[2019-08-07 12:10:53,964] INFO Deleted offset index /var/lib/kafka/data/topics/backup_br_domain_squad-14/00000000000000000000.index.deleted. (kafka.log.LogSegment)}} {{[2019-08-07 12:10:53,964] INFO Deleted time index /var/lib/kafka/data/topics/backup_br_domain_squad-14/00000000000000000000.timeindex.deleted. (kafka.log.LogSegment)}} {{[2019-08-07 12:10:54,031] INFO [Log partition=backup_br_domain_squad-14, dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} {{[2019-08-07 12:10:54,032] INFO Deleted log /var/lib/kafka/data/topics/backup_br_domain_squad-14/00000000000000000000.log.deleted. (kafka.log.LogSegment)}} {{[2019-08-07 12:10:54,032] INFO Deleted offset index /var/lib/kafka/data/topics/backup_br_domain_squad-14/00000000000000000000.index.deleted. (kafka.log.LogSegment)}} {{[2019-08-07 12:10:54,032] INFO Deleted time index /var/lib/kafka/data/topics/backup_br_domain_squad-14/00000000000000000000.timeindex.deleted. (kafka.log.LogSegment)}} {{[2019-08-07 12:10:54,101] INFO [Log partition=backup_br_domain_squad-14, dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} {{[2019-08-07 12:10:54,101] INFO Deleted log /var/lib/kafka/data/topics/backup_br_domain_squad-14/00000000000000000000.log.deleted. (kafka.log.LogSegment)}} {{[2019-08-07 12:10:54,101] INFO Deleted offset index /var/lib/kafka/data/topics/backup_br_domain_squad-14/00000000000000000000.index.deleted. (kafka.log.LogSegment)}} {{[2019-08-07 12:10:54,101] INFO Deleted time index /var/lib/kafka/data/topics/backup_br_domain_squad-14/00000000000000000000.timeindex.deleted. (kafka.log.LogSegment)}} {{[2019-08-07 12:10:54,173] INFO [Log partition=backup_br_domain_squad-14, dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} {{[2019-08-07 12:10:54,173] INFO Deleted log /var/lib/kafka/data/topics/backup_br_domain_squad-14/00000000000000000000.log.deleted. (kafka.log.LogSegment)}} {{[2019-08-07 12:10:54,173] INFO Deleted offset index /var/lib/kafka/data/topics/backup_br_domain_squad-14/00000000000000000000.index.deleted. (kafka.log.LogSegment)}} {{[2019-08-07 12:10:54,173] INFO Deleted time index /var/lib/kafka/data/topics/backup_br_domain_squad-14/00000000000000000000.timeindex.deleted. (kafka.log.LogSegment)}} And such log keeps repeating forever. I've been able to extract segment files from (running) leader broker, and replicated same behaviour locally. h1. h1. *Reproduction setup:* * start single broker kafka_2.12-2.2.1 * create topic ** {{./bin/kafka-topics.sh --bootstrap-server *__BOOTSTRAP_SERVER__*:9092 --create --topic backup_br_domain_squad --partitions 1 --replication-factor 1 --config cleanup.policy=compact --config min.compaction.lag.ms=86400000 --config min.cleanable.dirty.ratio=0.3}} * stop broker * copy segment files (attachment content, under segments) to backup_br_domain_squad-0 log folder * rerun broker again After rerun, fun starts with endless repeating logging outputs: {{[2019-08-07 13:05:37,610] DEBUG Checking if log segment may be cleaned: log='backup_br_domain_squad-0' segment.baseOffset=0 segment.largestTimestamp=1563203503987; now - compactionLag=1565096737609; is uncleanable=false (kafka.log.LogCleanerManager$)}} {{[2019-08-07 13:05:37,610] DEBUG Finding range of cleanable offsets for log=backup_br_domain_squad-0 topicPartition=backup_br_domain_squad-0. Last clean offset=Some(0) now=1565183137609 => firstDirtyOffset=0 firstUncleanableOffset=233 activeSegment.baseOffset=233 (kafka.log.LogCleanerManager$)}} {{[2019-08-07 13:05:37,613] INFO Cleaner 0: Beginning cleaning of log backup_br_domain_squad-0. (kafka.log.LogCleaner)}} {{[2019-08-07 13:05:37,613] INFO Cleaner 0: Building offset map for backup_br_domain_squad-0... (kafka.log.LogCleaner)}} {{[2019-08-07 13:05:37,628] INFO Cleaner 0: Building offset map for log backup_br_domain_squad-0 for 1 segments in offset range [0, 233). (kafka.log.LogCleaner)}} {{[2019-08-07 13:05:37,643] INFO Cleaner 0: Offset map for log backup_br_domain_squad-0 complete. (kafka.log.LogCleaner)}} {{[2019-08-07 13:05:37,644] INFO Cleaner 0: Cleaning log backup_br_domain_squad-0 (cleaning prior to Wed Aug 07 13:05:22 GMT 2019, discarding tombstones prior to Thu Jan 01 00:00:00 GMT 1970)... (kafka.log.LogCleaner)}} {{[2019-08-07 13:05:37,645] DEBUG Loaded index file /runtime/logs/kafka_1/backup_br_domain_squad-0/00000000000000000000.index.cleaned with maxEntries = 1310720, maxIndexSize = 10485760, entries = 0, lastOffset = 0, file position = 0 (kafka.log.OffsetIndex)}} {{[2019-08-07 13:05:37,646] DEBUG Loaded index file /runtime/logs/kafka_1/backup_br_domain_squad-0/00000000000000000000.timeindex.cleaned with maxEntries = 873813, maxIndexSize = 10485760, entries = 0, lastOffset = TimestampOffset(-1,0), file position = 0 (kafka.log.TimeIndex)}} {{[2019-08-07 13:05:37,647] INFO Cleaner 0: Cleaning segment 0 in log backup_br_domain_squad-0 (largest timestamp Mon Jul 15 15:11:43 GMT 2019) into 0, retaining deletes. (kafka.log.LogCleaner)}} {{[2019-08-07 13:05:37,656] DEBUG Resized /runtime/logs/kafka_1/backup_br_domain_squad-0/00000000000000000000.index.cleaned to 32, position is 32 and limit is 32 (kafka.log.OffsetIndex)}} {{[2019-08-07 13:05:37,657] DEBUG Resized /runtime/logs/kafka_1/backup_br_domain_squad-0/00000000000000000000.timeindex.cleaned to 36, position is 36 and limit is 36 (kafka.log.TimeIndex)}} {{[2019-08-07 13:05:37,669] INFO Cleaner 0: Swapping in cleaned segment LogSegment(baseOffset=0, size=1044021) for segment(s) List(LogSegment(baseOffset=0, size=1044021)) in log Log(/runtime/logs/kafka_1/backup_br_domain_squad-0) (kafka.log.LogCleaner)}} {{[2019-08-07 13:05:37,672] INFO [kafka-log-cleaner-thread-0]:}} {{ Log cleaner thread 0 cleaned log backup_br_domain_squad-0 (dirty section = [0, 0])}} {{ 1.0 MB of log processed in 0.1 seconds (17.2 MB/sec).}} {{ Indexed 1.0 MB in 0.0 seconds (33.2 Mb/sec, 51.7% of total time)}} {{ Buffer utilization: 0.0%}} {{ Cleaned 1.0 MB in 0.0 seconds (35.6 Mb/sec, 48.3% of total time)}} {{ Start size: 1.0 MB (231 messages)}} {{ End size: 1.0 MB (231 messages)}} {{ 0.0% size reduction (0.0% fewer messages)}} {{ (kafka.log.LogCleaner)}} {{[2019-08-07 13:05:37,676] DEBUG Checking if log segment may be cleaned: log='backup_br_domain_squad-0' segment.baseOffset=0 segment.largestTimestamp=1563203503987; now - compactionLag=1565096737676; is uncleanable=false (kafka.log.LogCleanerManager$)}} {{[2019-08-07 13:05:37,677] DEBUG Finding range of cleanable offsets for log=backup_br_domain_squad-0 topicPartition=backup_br_domain_squad-0. Last clean offset=Some(232) now=1565183137676 => firstDirtyOffset=232 firstUncleanableOffset=233 activeSegment.baseOffset=233 (kafka.log.LogCleanerManager$)}} {{[2019-08-07 13:05:37,677] INFO Cleaner 0: Beginning cleaning of log backup_br_domain_squad-0. (kafka.log.LogCleaner)}} {{[2019-08-07 13:05:37,678] INFO Cleaner 0: Building offset map for backup_br_domain_squad-0... (kafka.log.LogCleaner)}} {{[2019-08-07 13:05:37,680] INFO [SocketServer brokerId=1001] Created data-plane acceptor and processors for endpoint : EndPoint(null,9082,ListenerName(INSIDE_DOCKER),PLAINTEXT) (kafka.network.SocketServer)}} {{[2019-08-07 13:05:37,680] INFO [SocketServer brokerId=1001] Created data-plane acceptor and processors for endpoint : EndPoint(null,9082,ListenerName(INSIDE_DOCKER),PLAINTEXT) (kafka.network.SocketServer)}} {{[2019-08-07 13:05:37,681] INFO Awaiting socket connections on s0.0.0.0:9092. (kafka.network.Acceptor)}} {{[2019-08-07 13:05:37,681] INFO Awaiting socket connections on s0.0.0.0:9092. (kafka.network.Acceptor)}} {{[2019-08-07 13:05:37,700] INFO [SocketServer brokerId=1001] Created data-plane acceptor and processors for endpoint : EndPoint(null,9092,ListenerName(OUTSIDE_DOCKER),PLAINTEXT) (kafka.network.SocketServer)}} {{[2019-08-07 13:05:37,700] INFO [SocketServer brokerId=1001] Created data-plane acceptor and processors for endpoint : EndPoint(null,9092,ListenerName(OUTSIDE_DOCKER),PLAINTEXT) (kafka.network.SocketServer)}} {{[2019-08-07 13:05:37,701] INFO Awaiting socket connections on s0.0.0.0:19092. (kafka.network.Acceptor)}} {{[2019-08-07 13:05:37,701] INFO Cleaner 0: Building offset map for log backup_br_domain_squad-0 for 1 segments in offset range [232, 233). (kafka.log.LogCleaner)}} {{[2019-08-07 13:05:37,701] INFO Awaiting socket connections on s0.0.0.0:19092. (kafka.network.Acceptor)}} {{[2019-08-07 13:05:37,702] INFO Cleaner 0: Offset map for log backup_br_domain_squad-0 complete. (kafka.log.LogCleaner)}} {{[2019-08-07 13:05:37,702] INFO Cleaner 0: Cleaning log backup_br_domain_squad-0 (cleaning prior to Wed Aug 07 13:05:22 GMT 2019, discarding tombstones prior to Tue Aug 06 13:05:22 GMT 2019)... (kafka.log.LogCleaner)}} {{[2019-08-07 13:05:37,703] INFO [kafka-log-cleaner-thread-0]:}} {{ Log cleaner thread 0 cleaned log backup_br_domain_squad-0 (dirty section = [232, 232])}} {{ 0.0 MB of log processed in 0.0 seconds (0.0 MB/sec).}} {{ Indexed 0.0 MB in 0.0 seconds (0.3 Mb/sec, 96.2% of total time)}} {{ Buffer utilization: 0.0%}} {{ Cleaned 0.0 MB in 0.0 seconds (0.0 Mb/sec, 3.8% of total time)}} {{ Start size: 0.0 MB (0 messages)}} {{ End size: 0.0 MB (0 messages)}} {{ NaN% size reduction (NaN% fewer messages)}} {{ (kafka.log.LogCleaner)}} {{[2019-08-07 13:05:37,707] DEBUG Checking if log segment may be cleaned: log='backup_br_domain_squad-0' segment.baseOffset=0 segment.largestTimestamp=1563203503987; now - compactionLag=1565096737707; is uncleanable=false (kafka.log.LogCleanerManager$)}} {{[2019-08-07 13:05:37,708] DEBUG Finding range of cleanable offsets for log=backup_br_domain_squad-0 topicPartition=backup_br_domain_squad-0. Last clean offset=Some(0) now=1565183137707 => firstDirtyOffset=0 firstUncleanableOffset=233 activeSegment.baseOffset=233 (kafka.log.LogCleanerManager$)}} {{[2019-08-07 13:05:37,708] INFO Cleaner 0: Beginning cleaning of log backup_br_domain_squad-0. (kafka.log.LogCleaner)}} {{[2019-08-07 13:05:37,708] INFO Cleaner 0: Building offset map for backup_br_domain_squad-0... (kafka.log.LogCleaner)}} {{[2019-08-07 13:05:37,719] INFO [SocketServer brokerId=1001] Created data-plane acceptor and processors for endpoint : EndPoint(null,19092,ListenerName(MAC_OS_DOCKER),PLAINTEXT) (kafka.network.SocketServer)}} {{[2019-08-07 13:05:37,719] INFO [SocketServer brokerId=1001] Created data-plane acceptor and processors for endpoint : EndPoint(null,19092,ListenerName(MAC_OS_DOCKER),PLAINTEXT) (kafka.network.SocketServer)}} {{[2019-08-07 13:05:37,721] INFO Cleaner 0: Building offset map for log backup_br_domain_squad-0 for 1 segments in offset range [0, 233). (kafka.log.LogCleaner)}} {{[2019-08-07 13:05:37,722] INFO [SocketServer brokerId=1001] Started 3 acceptor threads for data-plane (kafka.network.SocketServer)}} {{[2019-08-07 13:05:37,722] INFO [SocketServer brokerId=1001] Started 3 acceptor threads for data-plane (kafka.network.SocketServer)}} {{[2019-08-07 13:05:37,730] INFO Cleaner 0: Offset map for log backup_br_domain_squad-0 complete. (kafka.log.LogCleaner)}} {{[2019-08-07 13:05:37,730] INFO Cleaner 0: Cleaning log backup_br_domain_squad-0 (cleaning prior to Wed Aug 07 13:05:22 GMT 2019, discarding tombstones prior to Thu Jan 01 00:00:00 GMT 1970)... (kafka.log.LogCleaner)}} {{[2019-08-07 13:05:37,732] DEBUG Loaded index file /runtime/logs/kafka_1/backup_br_domain_squad-0/00000000000000000000.index.cleaned with maxEntries = 1310720, maxIndexSize = 10485760, entries = 0, lastOffset = 0, file position = 0 (kafka.log.OffsetIndex)}} {{[2019-08-07 13:05:37,733] DEBUG Loaded index file /runtime/logs/kafka_1/backup_br_domain_squad-0/00000000000000000000.timeindex.cleaned with maxEntries = 873813, maxIndexSize = 10485760, entries = 0, lastOffset = TimestampOffset(-1,0), file position = 0 (kafka.log.TimeIndex)}} {{[2019-08-07 13:05:37,733] INFO Cleaner 0: Cleaning segment 0 in log backup_br_domain_squad-0 (largest timestamp Mon Jul 15 15:11:43 GMT 2019) into 0, retaining deletes. (kafka.log.LogCleaner)}} {{[2019-08-07 13:05:37,739] DEBUG Resized /runtime/logs/kafka_1/backup_br_domain_squad-0/00000000000000000000.index.cleaned to 32, position is 32 and limit is 32 (kafka.log.OffsetIndex)}} {{[2019-08-07 13:05:37,739] DEBUG Resized /runtime/logs/kafka_1/backup_br_domain_squad-0/00000000000000000000.timeindex.cleaned to 36, position is 36 and limit is 36 (kafka.log.TimeIndex)}} {{[2019-08-07 13:05:37,764] INFO Cleaner 0: Swapping in cleaned segment LogSegment(baseOffset=0, size=1044021) for segment(s) List(LogSegment(baseOffset=0, size=1044021)) in log Log(/runtime/logs/kafka_1/backup_br_domain_squad-0) (kafka.log.LogCleaner)}} {{[2019-08-07 13:05:37,767] INFO [kafka-log-cleaner-thread-0]:}} {{ Log cleaner thread 0 cleaned log backup_br_domain_squad-0 (dirty section = [0, 0])}} {{ 1.0 MB of log processed in 0.1 seconds (17.2 MB/sec).}} {{ Indexed 1.0 MB in 0.0 seconds (45.3 Mb/sec, 37.9% of total time)}} {{ Buffer utilization: 0.0%}} {{ Cleaned 1.0 MB in 0.0 seconds (27.7 Mb/sec, 62.1% of total time)}} {{ Start size: 1.0 MB (231 messages)}} {{ End size: 1.0 MB (231 messages)}} {{ 0.0% size reduction (0.0% fewer messages)}} {{ (kafka.log.LogCleaner)}} {{[2019-08-07 13:05:37,772] DEBUG Checking if log segment may be cleaned: log='backup_br_domain_squad-0' segment.baseOffset=0 segment.largestTimestamp=1563203503987; now - compactionLag=1565096737771; is uncleanable=false (kafka.log.LogCleanerManager$)}} {{[2019-08-07 13:05:37,772] DEBUG Finding range of cleanable offsets for log=backup_br_domain_squad-0 topicPartition=backup_br_domain_squad-0. Last clean offset=Some(232) now=1565183137771 => firstDirtyOffset=232 firstUncleanableOffset=233 activeSegment.baseOffset=233 (kafka.log.LogCleanerManager$)}} {{[2019-08-07 13:05:37,773] INFO Cleaner 0: Beginning cleaning of log backup_br_domain_squad-0. (kafka.log.LogCleaner)}} {{[2019-08-07 13:05:37,774] INFO Cleaner 0: Building offset map for backup_br_domain_squad-0... (kafka.log.LogCleaner)}} {{[2019-08-07 13:05:37,795] INFO [ExpirationReaper-1001-Produce]: Starting (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)}} {{[2019-08-07 13:05:37,795] INFO [ExpirationReaper-1001-Produce]: Starting (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)}} {{[2019-08-07 13:05:37,800] INFO [ExpirationReaper-1001-Fetch]: Starting (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)}} {{[2019-08-07 13:05:37,800] INFO [ExpirationReaper-1001-Fetch]: Starting (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)}} {{[2019-08-07 13:05:37,803] INFO [ExpirationReaper-1001-DeleteRecords]: Starting (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)}} {{[2019-08-07 13:05:37,803] INFO [ExpirationReaper-1001-DeleteRecords]: Starting (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)}} {{[2019-08-07 13:05:37,805] INFO [ExpirationReaper-1001-ElectPreferredLeader]: Starting (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)}} {{[2019-08-07 13:05:37,805] INFO [ExpirationReaper-1001-ElectPreferredLeader]: Starting (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)}} {{[2019-08-07 13:05:37,806] INFO Cleaner 0: Building offset map for log backup_br_domain_squad-0 for 1 segments in offset range [232, 233). (kafka.log.LogCleaner)}} {{[2019-08-07 13:05:37,807] INFO Cleaner 0: Offset map for log backup_br_domain_squad-0 complete. (kafka.log.LogCleaner)}} {{[2019-08-07 13:05:37,808] INFO Cleaner 0: Cleaning log backup_br_domain_squad-0 (cleaning prior to Wed Aug 07 13:05:22 GMT 2019, discarding tombstones prior to Tue Aug 06 13:05:22 GMT 2019)... (kafka.log.LogCleaner)}} {{[2019-08-07 13:05:37,809] INFO [kafka-log-cleaner-thread-0]:}} {{ Log cleaner thread 0 cleaned log backup_br_domain_squad-0 (dirty section = [232, 232])}} {{ 0.0 MB of log processed in 0.0 seconds (0.0 MB/sec).}} {{ Indexed 0.0 MB in 0.0 seconds (0.2 Mb/sec, 97.1% of total time)}} {{ Buffer utilization: 0.0%}} {{ Cleaned 0.0 MB in 0.0 seconds (0.0 Mb/sec, 2.9% of total time)}} {{ Start size: 0.0 MB (0 messages)}} {{ End size: 0.0 MB (0 messages)}} {{ NaN% size reduction (NaN% fewer messages)}} {{ (kafka.log.LogCleaner)}} -- This message was sent by Atlassian JIRA (v7.6.14#76016)