Re: what happened in case of single disk failure

张祥 Thu, 12 Mar 2020 18:18:13 -0700

Thanks, it helps a lot, for a long time I am used to read documentation on
Kafka official site, you made me realize that there are also a lot of
resources on Confluent.


M. Manna <manme...@gmail.com> 于2020年3月12日周四 下午9:06写道：

> Please see the following link from Confluent. Also, if you register with
> Confluent Technical Talks, they are running quite a lot of nice and
> simplified webinar this month on Fundamentals of Kafka.
>
> https://www.youtube.com/watch?v=ibozaujze9k
>
> I thought the 2 part presentation was quite good (but I don't work for
> Confluent :), so a disclaimer in advance).
>
>  There is also an upcoming webinar on how Kafka is integrated in your
> application/architecture.
>
> I hope it helps.
>
> Regards,
> M. MAnna
>
> On Thu, 12 Mar 2020 at 00:51, 张祥 <xiangzhang1...@gmail.com> wrote:
>
> > Thanks, very helpful !
> >
> > Peter Bukowinski <pmb...@gmail.com> 于2020年3月12日周四 上午5:48写道：
> >
> > > Yes, that’s correct. While a broker is down:
> > >
> > > all topic partitions assigned to that broker will be under-replicated
> > > topic partitions with an unmet minimum ISR count will be offline
> > > leadership of partitions meeting the minimum ISR count will move to the
> > > next in-sync replica in the replica list
> > > if no in-sync replica exists for a topic-partitions, it will be offline
> > > Setting unclean.leader.election.enable=true will allow an out-of-sync
> > > replica to become a leader.
> > > If topic partition availability is more important to you than data
> > > integrity, you should allow unclean leader election.
> > >
> > >
> > > > On Mar 11, 2020, at 6:11 AM, 张祥 <xiangzhang1...@gmail.com> wrote:
> > > >
> > > > Hi, Peter, following what we talked about before, I want to
> understand
> > > what
> > > > will happen when one broker goes down, I would say it will be very
> > > similar
> > > > to what happens under disk failure, except that the rules apply to
> all
> > > the
> > > > partitions on that broker instead of only one malfunctioned disk. Am
> I
> > > > right? Thanks.
> > > >
> > > > 张祥 <xiangzhang1...@gmail.com> 于2020年3月5日周四 上午9:25写道：
> > > >
> > > >> Thanks Peter, really appreciate it.
> > > >>
> > > >> Peter Bukowinski <pmb...@gmail.com> 于2020年3月4日周三 下午11:50写道：
> > > >>
> > > >>> Yes, you should restart the broker. I don’t believe there’s any
> code
> > to
> > > >>> check if a Log directory previously marked as failed has returned
> to
> > > >>> healthy.
> > > >>>
> > > >>> I always restart the broker after a hardware repair. I treat broker
> > > >>> restarts as a normal, non-disruptive operation in my clusters. I
> use
> > a
> > > >>> minimum of 3x replication.
> > > >>>
> > > >>> -- Peter (from phone)
> > > >>>
> > > >>>> On Mar 4, 2020, at 12:46 AM, 张祥 <xiangzhang1...@gmail.com> wrote:
> > > >>>>
> > > >>>> Another question, according to my memory, the broker needs to be
> > > >>> restarted
> > > >>>> after replacing disk to recover this. Is that correct? If so, I
> take
> > > >>> that
> > > >>>> Kafka cannot know by itself that the disk has been replaced,
> > manually
> > > >>>> restart is necessary.
> > > >>>>
> > > >>>> 张祥 <xiangzhang1...@gmail.com> 于2020年3月4日周三 下午2:48写道：
> > > >>>>
> > > >>>>> Thanks Peter, it makes a lot of sense.
> > > >>>>>
> > > >>>>> Peter Bukowinski <pmb...@gmail.com> 于2020年3月3日周二 上午11:56写道：
> > > >>>>>
> > > >>>>>> Whether your brokers have a single data directory or multiple
> data
> > > >>>>>> directories on separate disks, when a disk fails, the topic
> > > partitions
> > > >>>>>> located on that disk become unavailable. What happens next
> depends
> > > on
> > > >>> how
> > > >>>>>> your cluster and topics are configured.
> > > >>>>>>
> > > >>>>>> If the topics on the affected broker have replicas and the
> minimum
> > > ISR
> > > >>>>>> (in-sync replicas) count is met, then all topic partitions will
> > > remain
> > > >>>>>> online and leaders will move to another broker. Producers and
> > > >>> consumers
> > > >>>>>> will continue to operate as usual.
> > > >>>>>>
> > > >>>>>> If the topics don’t have replicas or the minimum ISR count is
> not
> > > met,
> > > >>>>>> then the topic partitions on the failed disk will be offline.
> > > >>> Producers can
> > > >>>>>> still send data to the affected topics — it will just go to the
> > > online
> > > >>>>>> partitions. Consumers can still consume data from the online
> > > >>> partitions.
> > > >>>>>>
> > > >>>>>> -- Peter
> > > >>>>>>
> > > >>>>>>>> On Mar 2, 2020, at 7:00 PM, 张祥 <xiangzhang1...@gmail.com>
> > wrote:
> > > >>>>>>>>
> > > >>>>>>>> Hi community，
> > > >>>>>>>>
> > > >>>>>>>> I ran into disk failure when using Kafka, and fortunately it
> did
> > > not
> > > >>>>>> crash
> > > >>>>>>> the entire cluster. So I am wondering how Kafka handles
> multiple
> > > >>> disks
> > > >>>>>> and
> > > >>>>>>> it manages to work in case of single disk failure. The more
> > > detailed,
> > > >>>>>> the
> > > >>>>>>> better. Thanks !
> > > >>>>>>
> > > >>>>>
> > > >>>
> > > >>
> > >
> > >
> >
>

Re: what happened in case of single disk failure

Reply via email to