Hi.

By default, Kafka returns ack without waiting fsync to the disk. But you
can change this behavior by log.flush.interval.messages config.
For data durability, Kafka mainly relies on replication instead.

> then there is potential for message loss if the node crashes before

On the crashed node, that's true. However, as long as you configure
replicas to span multiple AZ, the data-loss possibility would be very rare
because simultaneous multi-AZ power-failure is unlikely to happen.
FYI Jack Vanlightly wrote nice article about this topic:
https://jack-vanlightly.com/blog/2023/4/24/why-apache-kafka-doesnt-need-fsync-to-be-safe

> But the upside is reduced latency as writes to the pagecache

True. That's why Kafka is performant even on HDDs.
Also, relying on page-cache is a good compromise between latency and
durability because it's still robust against application crash (e.g. by JVM
crash).

2024年3月14日(木) 21:37 Sreyan Chakravarty <sreya...@gmail.com>:

> I am trying to understand when does Kafka signal to the producer that the
> message was successfully accepted into Kafka.
>
> Does Kafka:
>
> 1) Write to the pagecache of the node's OS and then return back an ACK ?
>  If so, then there is potential for message loss if the node crashes before
> fsync to disk. But the upside is reduced latency as writes to the pagecache
> are very fast compared to a fsync to disk.
>
> 2) Wait for an fsync to happen on each message ?
> If so, then there is increased latency but guarantees each message is
> written to disk
>
> 3) Or is this a purely configurable option between the two ?
>
> --
> Regards,
> Sreyan Chakravarty
>


-- 
========================
Okada Haruki
ocadar...@gmail.com
========================

Reply via email to