Hi All: We are using Kafka 0.8.2.2 in our sit enviornment, and meeting an data lose case when all brokers (2 brokers) going down and restart again. I am tring to understand the log management and recovery mechanism in kafka and i found a useful description document: KIP-101 - Alter Replication Protocol to use Leader Epoch rather than High Watermark for Truncation ( https://cwiki.apache.org/confluence/display/KAFKA/KIP-101+-+Alter+Replication+Protocol+to+use+Leader+Epoch+rather+than+High+Watermark+for+Truncation) . However, i meet some difficulties to understand the "Scenario 1: High Watermark Truncation followed by Immediate Leader Election" description in this article. In this scenario, leader B has update HW to m2, however, follower A just got m2, but not update its local HW to m2, and ”the follower (A) has message m2, but has not yet got confirmation from the leader (B) that m2 has been committed (the second round of replication, which lets (A) move forward its high watermark past m2, has yet to happen)“
Is that possible? Since there is only 2 brokers, and i think, leader B update HW to m2 only if follower A fetch m2, and also, when follower A fetch m2, leader B update HW to m2, follower A will get this updated HW(m2) infomation in the m2's fetchMessage response (with HighwaterMarkOffset field), it won't need to get any confirmation on the second round as article methioned. I am confusing about this part. I think if there is another broker follower C which fetch fetch m2 later than follower A, that would lead to follower A waiting for second round of replication to confirm HW(m2), but there is no broker c in this description am i missing some procedure or there is some flaw for this description about the case? any explanations are appreciated, thanks all kafka developers to bring us such a greate production. Alex.Chen