divijvaidya commented on PR #14242:
URL: https://github.com/apache/kafka/pull/14242#issuecomment-1778956865

   Hey @ocadaruma 
   This is an important change, so I despite low engagement on this from me, I 
do believe that this change is critical. Let's think carefully about this.
   
   I would like to discuss the changes individually.
   
   Let's start with (1)
   
   The side-effect of moving producer snapshot flush to async thread is that it 
while earlier it was guaranteed that producer snapshot is present and 
consistent when segment (and others) are flushed. If for some reason, the 
producer fsync failed, we would not have scheduled a flush for segment and 
friends. But now, since we are flushing snapshot async & quietly, it might be 
possible that we have segment and indexes on disk but we don't have a producer 
snapshot. 
   
   This is ok for server restart, because on restart, we will rebuild the 
snapshot by scanning last few segments.
   This is ok even if the server doesn't restart because we will be using 
in-memory value of producer state and we might take another snapshot associated 
with next segment later.
   
   To summarize, Kafka does not expect producer snapshot on disk to be strongly 
consistent with rest of files such as log segment and transaction index. 
@jolshan (as an expert on trx index and producer snapshot) do you agree with 
this statement?
   
   If we agree on this, then (1) is a safe change IMO.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to