Hi,

I've noticed an interesting behaviour which I hope someone can fully
explain.
I have 3 Kafka Node cluster with a setting of log.retention.hours=168 (7
days) and log.segment.bytes=536870912.

I recently restarted one of the nodes and it's uptime is now 3 days behind
than the other 2.

After about 7 days I noticed that the other 2 nodes cleared out an equal
amount of stale logs/data,
but the restarted node didn't cleared out the same amount. The restarted
node only cleared out a
similar amount 3 days later. Generally now the restarted node seems to be 3
days (free space wise) behind
the other 2.

I noticed that certain partition log and index files on the restarted node
are different to the other 2.
Below is an example (Node C is the restarted one)


Node A - Topic Z Partition 12
[        488     Sep 26 11:47]  00000000000000046460.index
[    1781829  Sep 24 12:09]  00000000000000046460.log
[   10485760 Sep 28 22:23]  00000000000000046522.index
[    1536693  Sep 28 22:23]  00000000000000046522.log

Node B - Topic Z Partition 12
[        488     Sep 26 11:47]  00000000000000046460.index
[    1781829  Sep 24 12:09]  00000000000000046460.log
[   10485760 Sep 28 22:23]  00000000000000046522.index
[    1536693  Sep 28 22:23]  00000000000000046522.log

Node C - Topic Z Partition 12
[   10485760 Sep 28 22:23]  00000000000000046485.index
[    2277311  Sep 28 22:23]  00000000000000046485.log

I can see that Node C's base offset (log prefix) is in between the offsets
of the other logs on Node A & B, and that
suggests to me some partition 12 messages are on Node A & B but not on Node
C?

I was hoping someone could help me figure out what's happening.

Thanks
Dayo

Reply via email to