Re: what happened in case of single disk failure

Peter Bukowinski Wed, 11 Mar 2020 14:49:25 -0700

Yes, that’s correct. While a broker is down:

all topic partitions assigned to that broker will be under-replicated
topic partitions with an unmet minimum ISR count will be offline
leadership of partitions meeting the minimum ISR count will move to the next 
in-sync replica in the replica list
if no in-sync replica exists for a topic-partitions, it will be offline
Setting unclean.leader.election.enable=true will allow an out-of-sync replica 
to become a leader.
If topic partition availability is more important to you than data integrity, 
you should allow unclean leader election.



> On Mar 11, 2020, at 6:11 AM, 张祥 <xiangzhang1...@gmail.com> wrote:
> 
> Hi, Peter, following what we talked about before, I want to understand what
> will happen when one broker goes down, I would say it will be very similar
> to what happens under disk failure, except that the rules apply to all the
> partitions on that broker instead of only one malfunctioned disk. Am I
> right? Thanks.
> 
> 张祥 <xiangzhang1...@gmail.com> 于2020年3月5日周四 上午9:25写道：
> 
>> Thanks Peter, really appreciate it.
>> 
>> Peter Bukowinski <pmb...@gmail.com> 于2020年3月4日周三 下午11:50写道：
>> 
>>> Yes, you should restart the broker. I don’t believe there’s any code to
>>> check if a Log directory previously marked as failed has returned to
>>> healthy.
>>> 
>>> I always restart the broker after a hardware repair. I treat broker
>>> restarts as a normal, non-disruptive operation in my clusters. I use a
>>> minimum of 3x replication.
>>> 
>>> -- Peter (from phone)
>>> 
>>>> On Mar 4, 2020, at 12:46 AM, 张祥 <xiangzhang1...@gmail.com> wrote:
>>>> 
>>>> Another question, according to my memory, the broker needs to be
>>> restarted
>>>> after replacing disk to recover this. Is that correct? If so, I take
>>> that
>>>> Kafka cannot know by itself that the disk has been replaced, manually
>>>> restart is necessary.
>>>> 
>>>> 张祥 <xiangzhang1...@gmail.com> 于2020年3月4日周三 下午2:48写道：
>>>> 
>>>>> Thanks Peter, it makes a lot of sense.
>>>>> 
>>>>> Peter Bukowinski <pmb...@gmail.com> 于2020年3月3日周二 上午11:56写道：
>>>>> 
>>>>>> Whether your brokers have a single data directory or multiple data
>>>>>> directories on separate disks, when a disk fails, the topic partitions
>>>>>> located on that disk become unavailable. What happens next depends on
>>> how
>>>>>> your cluster and topics are configured.
>>>>>> 
>>>>>> If the topics on the affected broker have replicas and the minimum ISR
>>>>>> (in-sync replicas) count is met, then all topic partitions will remain
>>>>>> online and leaders will move to another broker. Producers and
>>> consumers
>>>>>> will continue to operate as usual.
>>>>>> 
>>>>>> If the topics don’t have replicas or the minimum ISR count is not met,
>>>>>> then the topic partitions on the failed disk will be offline.
>>> Producers can
>>>>>> still send data to the affected topics — it will just go to the online
>>>>>> partitions. Consumers can still consume data from the online
>>> partitions.
>>>>>> 
>>>>>> -- Peter
>>>>>> 
>>>>>>>> On Mar 2, 2020, at 7:00 PM, 张祥 <xiangzhang1...@gmail.com> wrote:
>>>>>>>> 
>>>>>>>> Hi community，
>>>>>>>> 
>>>>>>>> I ran into disk failure when using Kafka, and fortunately it did not
>>>>>> crash
>>>>>>> the entire cluster. So I am wondering how Kafka handles multiple
>>> disks
>>>>>> and
>>>>>>> it manages to work in case of single disk failure. The more detailed,
>>>>>> the
>>>>>>> better. Thanks !
>>>>>> 
>>>>> 
>>> 
>>

Re: what happened in case of single disk failure

Reply via email to