Hi, I've noticed an interesting behaviour which I hope someone can fully explain. I have 3 Kafka Node cluster with a setting of log.retention.hours=168 (7 days) and log.segment.bytes=536870912.
I recently restarted one of the nodes and it's uptime is now 3 days behind than the other 2. After about 7 days I noticed that the other 2 nodes cleared out an equal amount of stale logs/data, but the restarted node didn't cleared out the same amount. The restarted node only cleared out a similar amount 3 days later. Generally now the restarted node seems to be 3 days (free space wise) behind the other 2. I noticed that certain partition log and index files on the restarted node are different to the other 2. Below is an example (Node C is the restarted one) Node A - Topic Z Partition 12 [ 488 Sep 26 11:47] 00000000000000046460.index [ 1781829 Sep 24 12:09] 00000000000000046460.log [ 10485760 Sep 28 22:23] 00000000000000046522.index [ 1536693 Sep 28 22:23] 00000000000000046522.log Node B - Topic Z Partition 12 [ 488 Sep 26 11:47] 00000000000000046460.index [ 1781829 Sep 24 12:09] 00000000000000046460.log [ 10485760 Sep 28 22:23] 00000000000000046522.index [ 1536693 Sep 28 22:23] 00000000000000046522.log Node C - Topic Z Partition 12 [ 10485760 Sep 28 22:23] 00000000000000046485.index [ 2277311 Sep 28 22:23] 00000000000000046485.log I can see that Node C's base offset (log prefix) is in between the offsets of the other logs on Node A & B, and that suggests to me some partition 12 messages are on Node A & B but not on Node C? I was hoping someone could help me figure out what's happening. Thanks Dayo