Yes, that’s correct. While a broker is down: all topic partitions assigned to that broker will be under-replicated topic partitions with an unmet minimum ISR count will be offline leadership of partitions meeting the minimum ISR count will move to the next in-sync replica in the replica list if no in-sync replica exists for a topic-partitions, it will be offline Setting unclean.leader.election.enable=true will allow an out-of-sync replica to become a leader. If topic partition availability is more important to you than data integrity, you should allow unclean leader election.
> On Mar 11, 2020, at 6:11 AM, 张祥 <xiangzhang1...@gmail.com> wrote: > > Hi, Peter, following what we talked about before, I want to understand what > will happen when one broker goes down, I would say it will be very similar > to what happens under disk failure, except that the rules apply to all the > partitions on that broker instead of only one malfunctioned disk. Am I > right? Thanks. > > 张祥 <xiangzhang1...@gmail.com> 于2020年3月5日周四 上午9:25写道: > >> Thanks Peter, really appreciate it. >> >> Peter Bukowinski <pmb...@gmail.com> 于2020年3月4日周三 下午11:50写道: >> >>> Yes, you should restart the broker. I don’t believe there’s any code to >>> check if a Log directory previously marked as failed has returned to >>> healthy. >>> >>> I always restart the broker after a hardware repair. I treat broker >>> restarts as a normal, non-disruptive operation in my clusters. I use a >>> minimum of 3x replication. >>> >>> -- Peter (from phone) >>> >>>> On Mar 4, 2020, at 12:46 AM, 张祥 <xiangzhang1...@gmail.com> wrote: >>>> >>>> Another question, according to my memory, the broker needs to be >>> restarted >>>> after replacing disk to recover this. Is that correct? If so, I take >>> that >>>> Kafka cannot know by itself that the disk has been replaced, manually >>>> restart is necessary. >>>> >>>> 张祥 <xiangzhang1...@gmail.com> 于2020年3月4日周三 下午2:48写道: >>>> >>>>> Thanks Peter, it makes a lot of sense. >>>>> >>>>> Peter Bukowinski <pmb...@gmail.com> 于2020年3月3日周二 上午11:56写道: >>>>> >>>>>> Whether your brokers have a single data directory or multiple data >>>>>> directories on separate disks, when a disk fails, the topic partitions >>>>>> located on that disk become unavailable. What happens next depends on >>> how >>>>>> your cluster and topics are configured. >>>>>> >>>>>> If the topics on the affected broker have replicas and the minimum ISR >>>>>> (in-sync replicas) count is met, then all topic partitions will remain >>>>>> online and leaders will move to another broker. Producers and >>> consumers >>>>>> will continue to operate as usual. >>>>>> >>>>>> If the topics don’t have replicas or the minimum ISR count is not met, >>>>>> then the topic partitions on the failed disk will be offline. >>> Producers can >>>>>> still send data to the affected topics — it will just go to the online >>>>>> partitions. Consumers can still consume data from the online >>> partitions. >>>>>> >>>>>> -- Peter >>>>>> >>>>>>>> On Mar 2, 2020, at 7:00 PM, 张祥 <xiangzhang1...@gmail.com> wrote: >>>>>>>> >>>>>>>> Hi community, >>>>>>>> >>>>>>>> I ran into disk failure when using Kafka, and fortunately it did not >>>>>> crash >>>>>>> the entire cluster. So I am wondering how Kafka handles multiple >>> disks >>>>>> and >>>>>>> it manages to work in case of single disk failure. The more detailed, >>>>>> the >>>>>>> better. Thanks ! >>>>>> >>>>> >>> >>