Please see the following link from Confluent. Also, if you register with Confluent Technical Talks, they are running quite a lot of nice and simplified webinar this month on Fundamentals of Kafka.
https://www.youtube.com/watch?v=ibozaujze9k I thought the 2 part presentation was quite good (but I don't work for Confluent :), so a disclaimer in advance). There is also an upcoming webinar on how Kafka is integrated in your application/architecture. I hope it helps. Regards, M. MAnna On Thu, 12 Mar 2020 at 00:51, 张祥 <xiangzhang1...@gmail.com> wrote: > Thanks, very helpful ! > > Peter Bukowinski <pmb...@gmail.com> 于2020年3月12日周四 上午5:48写道: > > > Yes, that’s correct. While a broker is down: > > > > all topic partitions assigned to that broker will be under-replicated > > topic partitions with an unmet minimum ISR count will be offline > > leadership of partitions meeting the minimum ISR count will move to the > > next in-sync replica in the replica list > > if no in-sync replica exists for a topic-partitions, it will be offline > > Setting unclean.leader.election.enable=true will allow an out-of-sync > > replica to become a leader. > > If topic partition availability is more important to you than data > > integrity, you should allow unclean leader election. > > > > > > > On Mar 11, 2020, at 6:11 AM, 张祥 <xiangzhang1...@gmail.com> wrote: > > > > > > Hi, Peter, following what we talked about before, I want to understand > > what > > > will happen when one broker goes down, I would say it will be very > > similar > > > to what happens under disk failure, except that the rules apply to all > > the > > > partitions on that broker instead of only one malfunctioned disk. Am I > > > right? Thanks. > > > > > > 张祥 <xiangzhang1...@gmail.com> 于2020年3月5日周四 上午9:25写道: > > > > > >> Thanks Peter, really appreciate it. > > >> > > >> Peter Bukowinski <pmb...@gmail.com> 于2020年3月4日周三 下午11:50写道: > > >> > > >>> Yes, you should restart the broker. I don’t believe there’s any code > to > > >>> check if a Log directory previously marked as failed has returned to > > >>> healthy. > > >>> > > >>> I always restart the broker after a hardware repair. I treat broker > > >>> restarts as a normal, non-disruptive operation in my clusters. I use > a > > >>> minimum of 3x replication. > > >>> > > >>> -- Peter (from phone) > > >>> > > >>>> On Mar 4, 2020, at 12:46 AM, 张祥 <xiangzhang1...@gmail.com> wrote: > > >>>> > > >>>> Another question, according to my memory, the broker needs to be > > >>> restarted > > >>>> after replacing disk to recover this. Is that correct? If so, I take > > >>> that > > >>>> Kafka cannot know by itself that the disk has been replaced, > manually > > >>>> restart is necessary. > > >>>> > > >>>> 张祥 <xiangzhang1...@gmail.com> 于2020年3月4日周三 下午2:48写道: > > >>>> > > >>>>> Thanks Peter, it makes a lot of sense. > > >>>>> > > >>>>> Peter Bukowinski <pmb...@gmail.com> 于2020年3月3日周二 上午11:56写道: > > >>>>> > > >>>>>> Whether your brokers have a single data directory or multiple data > > >>>>>> directories on separate disks, when a disk fails, the topic > > partitions > > >>>>>> located on that disk become unavailable. What happens next depends > > on > > >>> how > > >>>>>> your cluster and topics are configured. > > >>>>>> > > >>>>>> If the topics on the affected broker have replicas and the minimum > > ISR > > >>>>>> (in-sync replicas) count is met, then all topic partitions will > > remain > > >>>>>> online and leaders will move to another broker. Producers and > > >>> consumers > > >>>>>> will continue to operate as usual. > > >>>>>> > > >>>>>> If the topics don’t have replicas or the minimum ISR count is not > > met, > > >>>>>> then the topic partitions on the failed disk will be offline. > > >>> Producers can > > >>>>>> still send data to the affected topics — it will just go to the > > online > > >>>>>> partitions. Consumers can still consume data from the online > > >>> partitions. > > >>>>>> > > >>>>>> -- Peter > > >>>>>> > > >>>>>>>> On Mar 2, 2020, at 7:00 PM, 张祥 <xiangzhang1...@gmail.com> > wrote: > > >>>>>>>> > > >>>>>>>> Hi community, > > >>>>>>>> > > >>>>>>>> I ran into disk failure when using Kafka, and fortunately it did > > not > > >>>>>> crash > > >>>>>>> the entire cluster. So I am wondering how Kafka handles multiple > > >>> disks > > >>>>>> and > > >>>>>>> it manages to work in case of single disk failure. The more > > detailed, > > >>>>>> the > > >>>>>>> better. Thanks ! > > >>>>>> > > >>>>> > > >>> > > >> > > > > >