Re: what happened in case of single disk failure

M. Manna Thu, 12 Mar 2020 06:06:27 -0700

Please see the following link from Confluent. Also, if you register with
Confluent Technical Talks, they are running quite a lot of nice and
simplified webinar this month on Fundamentals of Kafka.


https://www.youtube.com/watch?v=ibozaujze9k

I thought the 2 part presentation was quite good (but I don't work for
Confluent :), so a disclaimer in advance).

 There is also an upcoming webinar on how Kafka is integrated in your
application/architecture.

I hope it helps.

Regards,
M. MAnna

On Thu, 12 Mar 2020 at 00:51, 张祥 <xiangzhang1...@gmail.com> wrote:

> Thanks, very helpful !
>
> Peter Bukowinski <pmb...@gmail.com> 于2020年3月12日周四 上午5:48写道：
>
> > Yes, that’s correct. While a broker is down:
> >
> > all topic partitions assigned to that broker will be under-replicated
> > topic partitions with an unmet minimum ISR count will be offline
> > leadership of partitions meeting the minimum ISR count will move to the
> > next in-sync replica in the replica list
> > if no in-sync replica exists for a topic-partitions, it will be offline
> > Setting unclean.leader.election.enable=true will allow an out-of-sync
> > replica to become a leader.
> > If topic partition availability is more important to you than data
> > integrity, you should allow unclean leader election.
> >
> >
> > > On Mar 11, 2020, at 6:11 AM, 张祥 <xiangzhang1...@gmail.com> wrote:
> > >
> > > Hi, Peter, following what we talked about before, I want to understand
> > what
> > > will happen when one broker goes down, I would say it will be very
> > similar
> > > to what happens under disk failure, except that the rules apply to all
> > the
> > > partitions on that broker instead of only one malfunctioned disk. Am I
> > > right? Thanks.
> > >
> > > 张祥 <xiangzhang1...@gmail.com> 于2020年3月5日周四 上午9:25写道：
> > >
> > >> Thanks Peter, really appreciate it.
> > >>
> > >> Peter Bukowinski <pmb...@gmail.com> 于2020年3月4日周三 下午11:50写道：
> > >>
> > >>> Yes, you should restart the broker. I don’t believe there’s any code
> to
> > >>> check if a Log directory previously marked as failed has returned to
> > >>> healthy.
> > >>>
> > >>> I always restart the broker after a hardware repair. I treat broker
> > >>> restarts as a normal, non-disruptive operation in my clusters. I use
> a
> > >>> minimum of 3x replication.
> > >>>
> > >>> -- Peter (from phone)
> > >>>
> > >>>> On Mar 4, 2020, at 12:46 AM, 张祥 <xiangzhang1...@gmail.com> wrote:
> > >>>>
> > >>>> Another question, according to my memory, the broker needs to be
> > >>> restarted
> > >>>> after replacing disk to recover this. Is that correct? If so, I take
> > >>> that
> > >>>> Kafka cannot know by itself that the disk has been replaced,
> manually
> > >>>> restart is necessary.
> > >>>>
> > >>>> 张祥 <xiangzhang1...@gmail.com> 于2020年3月4日周三 下午2:48写道：
> > >>>>
> > >>>>> Thanks Peter, it makes a lot of sense.
> > >>>>>
> > >>>>> Peter Bukowinski <pmb...@gmail.com> 于2020年3月3日周二 上午11:56写道：
> > >>>>>
> > >>>>>> Whether your brokers have a single data directory or multiple data
> > >>>>>> directories on separate disks, when a disk fails, the topic
> > partitions
> > >>>>>> located on that disk become unavailable. What happens next depends
> > on
> > >>> how
> > >>>>>> your cluster and topics are configured.
> > >>>>>>
> > >>>>>> If the topics on the affected broker have replicas and the minimum
> > ISR
> > >>>>>> (in-sync replicas) count is met, then all topic partitions will
> > remain
> > >>>>>> online and leaders will move to another broker. Producers and
> > >>> consumers
> > >>>>>> will continue to operate as usual.
> > >>>>>>
> > >>>>>> If the topics don’t have replicas or the minimum ISR count is not
> > met,
> > >>>>>> then the topic partitions on the failed disk will be offline.
> > >>> Producers can
> > >>>>>> still send data to the affected topics — it will just go to the
> > online
> > >>>>>> partitions. Consumers can still consume data from the online
> > >>> partitions.
> > >>>>>>
> > >>>>>> -- Peter
> > >>>>>>
> > >>>>>>>> On Mar 2, 2020, at 7:00 PM, 张祥 <xiangzhang1...@gmail.com>
> wrote:
> > >>>>>>>>
> > >>>>>>>> Hi community，
> > >>>>>>>>
> > >>>>>>>> I ran into disk failure when using Kafka, and fortunately it did
> > not
> > >>>>>> crash
> > >>>>>>> the entire cluster. So I am wondering how Kafka handles multiple
> > >>> disks
> > >>>>>> and
> > >>>>>>> it manages to work in case of single disk failure. The more
> > detailed,
> > >>>>>> the
> > >>>>>>> better. Thanks !
> > >>>>>>
> > >>>>>
> > >>>
> > >>
> >
> >
>

Re: what happened in case of single disk failure

Reply via email to