ocadaruma commented on PR #14543: URL: https://github.com/apache/kafka/pull/14543#issuecomment-1761436502
@ctrlaltluc As @divijvaidya pointed out, flushing (i.e. calling `fsync`) under the UnifiedLog#lock could be a serious performance issue especially when disk's latency is high (e.g. using HDD or disk is overloaded) which several patches are proposed regarding this (#13782, #14242) > if the broker fails until the next flush To be precise, the condition of data loss is "broker server fails (≠ not process) at OS/Hardware level until the change is written to the device by OS", which is considered to be fairly rare if we deploy Kafka cluster properly (i.e. locate replicas in different failure domains). Also, even if we flush the directory, unless we flush the segment on every message append (which is not a common practice in Kafka), data-loss still could happen on server failure so relying on replication for data durability rather than fsync is the Kafka's design decision in my understanding. (As [Jack Vanlightly](https://jack-vanlightly.com/blog/2023/4/24/why-apache-kafka-doesnt-need-fsync-to-be-safe) recently summarized). Given that, I'm not sure if we should fsync inside the lock at the cost of performance impact. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org