Thanks, very helpful ! Peter Bukowinski <pmb...@gmail.com> 于2020年3月12日周四 上午5:48写道:
> Yes, that’s correct. While a broker is down: > > all topic partitions assigned to that broker will be under-replicated > topic partitions with an unmet minimum ISR count will be offline > leadership of partitions meeting the minimum ISR count will move to the > next in-sync replica in the replica list > if no in-sync replica exists for a topic-partitions, it will be offline > Setting unclean.leader.election.enable=true will allow an out-of-sync > replica to become a leader. > If topic partition availability is more important to you than data > integrity, you should allow unclean leader election. > > > > On Mar 11, 2020, at 6:11 AM, 张祥 <xiangzhang1...@gmail.com> wrote: > > > > Hi, Peter, following what we talked about before, I want to understand > what > > will happen when one broker goes down, I would say it will be very > similar > > to what happens under disk failure, except that the rules apply to all > the > > partitions on that broker instead of only one malfunctioned disk. Am I > > right? Thanks. > > > > 张祥 <xiangzhang1...@gmail.com> 于2020年3月5日周四 上午9:25写道: > > > >> Thanks Peter, really appreciate it. > >> > >> Peter Bukowinski <pmb...@gmail.com> 于2020年3月4日周三 下午11:50写道: > >> > >>> Yes, you should restart the broker. I don’t believe there’s any code to > >>> check if a Log directory previously marked as failed has returned to > >>> healthy. > >>> > >>> I always restart the broker after a hardware repair. I treat broker > >>> restarts as a normal, non-disruptive operation in my clusters. I use a > >>> minimum of 3x replication. > >>> > >>> -- Peter (from phone) > >>> > >>>> On Mar 4, 2020, at 12:46 AM, 张祥 <xiangzhang1...@gmail.com> wrote: > >>>> > >>>> Another question, according to my memory, the broker needs to be > >>> restarted > >>>> after replacing disk to recover this. Is that correct? If so, I take > >>> that > >>>> Kafka cannot know by itself that the disk has been replaced, manually > >>>> restart is necessary. > >>>> > >>>> 张祥 <xiangzhang1...@gmail.com> 于2020年3月4日周三 下午2:48写道: > >>>> > >>>>> Thanks Peter, it makes a lot of sense. > >>>>> > >>>>> Peter Bukowinski <pmb...@gmail.com> 于2020年3月3日周二 上午11:56写道: > >>>>> > >>>>>> Whether your brokers have a single data directory or multiple data > >>>>>> directories on separate disks, when a disk fails, the topic > partitions > >>>>>> located on that disk become unavailable. What happens next depends > on > >>> how > >>>>>> your cluster and topics are configured. > >>>>>> > >>>>>> If the topics on the affected broker have replicas and the minimum > ISR > >>>>>> (in-sync replicas) count is met, then all topic partitions will > remain > >>>>>> online and leaders will move to another broker. Producers and > >>> consumers > >>>>>> will continue to operate as usual. > >>>>>> > >>>>>> If the topics don’t have replicas or the minimum ISR count is not > met, > >>>>>> then the topic partitions on the failed disk will be offline. > >>> Producers can > >>>>>> still send data to the affected topics — it will just go to the > online > >>>>>> partitions. Consumers can still consume data from the online > >>> partitions. > >>>>>> > >>>>>> -- Peter > >>>>>> > >>>>>>>> On Mar 2, 2020, at 7:00 PM, 张祥 <xiangzhang1...@gmail.com> wrote: > >>>>>>>> > >>>>>>>> Hi community, > >>>>>>>> > >>>>>>>> I ran into disk failure when using Kafka, and fortunately it did > not > >>>>>> crash > >>>>>>> the entire cluster. So I am wondering how Kafka handles multiple > >>> disks > >>>>>> and > >>>>>>> it manages to work in case of single disk failure. The more > detailed, > >>>>>> the > >>>>>>> better. Thanks ! > >>>>>> > >>>>> > >>> > >> > >