Hi Daniel, Your understanding is mostly correct, a few notes to add:
1. fsync is done in a backend thread asynchronously, so event setting log.flush.interval.messages = 1 will not guarantee that when the producer request ack back, the messages is definitely on the disk now. But since you are using RF = 3 and acks = -1, as long as all replicas are still in ISR the produce response guarantees that the message has been replicated on all replicas. 2. If your case has a high produce load which may easily cause follower replicas to drop out of ISR, you may want to tune some replica.lag.* configs (details can be found below): http://kafka.apache.org/documentation.html#brokerconfigs Guozhang On Tue, Jul 15, 2014 at 6:35 PM, Daniel Compton <d...@danielcompton.net> wrote: > I think I know the answer to this already but I wanted to check my > assumptions before proceeding. > > We are using Kafka as a queueing mechanism for receiving messages from > stateless producers. We are operating in a legal framework where we can > never lose a committed message, but we can reject a write if Kafka is > unavailable and it will be retried in the future. We are operating all of > our servers in one rack so we are vulnerable if a whole rack goes out. We > will have 3-4 Kafka brokers and have RF=3 > > To guarantee that we never (to the greatest extent possible) lose a > message that we have acknowledged, it seems like we need to have > request.required.acks=-1 and log.flush.interval.messages = 1, i.e. fsync on > every message and wait for all brokers in ISR to reply before returning > successfully. This would guard against the failure scenario where all > servers in our rack go down simultaneously. > > Is my understanding correct? > > Thanks, Daniel. > > -- -- Guozhang