Hello everybody, Thank you for the detailed answers. My issue is partly answered here:
*This rule also applies to disk-level, which means that when a set ofpartitions assigned to a specific broker, each of the disks will get thesame number of partitions without considering the load of disks at thattime.* I admit, I didn't provide enough info either. So my problem is that an existing topic got a huge surge of events for this week. I knew that'll happen and I modified the partition count. Unfortunately, it occurred to me a bit later, that I'll likely need some extra disk space. So I added an extra disk to each broker. The thing I didn't know, that Kafka won't evenly distribute the partitions on the disks. So the question still remains: Is there any way to have Kafka evenly distribute data on its disks? Also, what options do I have *after *I'm in the situation I described above? (preferably without deleting the topic) Thanks! On Fri, Aug 7, 2020 at 12:00 PM Yingshuan Song <songyingsh...@gmail.com> wrote: > Hi Peter, > Agreed with Manoj and Vinicius, i think those rules led to this result : > > 1)the partitions of a topic - N and replication number - R determine the > real partition-replica count of this topic, which is N * R; > 2) kafka can distribute partitions evenly among brokers, but it is based > on the broker count when the topic was created, this is important. > If we create a topic (N - 4, R - 3) in a kafka cluster which contains 3 > kafka brokers, then 4 * 3 / 3 = 4 partitions will be assigned to each > broker. > But if a new broker was added into this cluster and another topic (N - 4, R > - 3) need to be created, then 4 * 3 / 4 = 3 partitions will be assigned to > each broker. > Kafka will not assign all those partitions to the new added broker even > though it is idle and i think this is a shortcoming of kafka. > This rule also applies to disk-level, which means that when a set of > partitions assigned to a specific broker, each of the disks will get the > same number of partitions without considering the load of disks at that > time. > 3) when producer send records to topics, how to chose partiton : 3-1) if a > record has a key, then the partition number calculate according to the key; > 3-2) if records have no keys, then those records will be sent to each > partition in turns. So, if there are lots of records with the same key, and > those records will be sent to the same partition, and may take up a lot of > disk space. > > > hope this helps > > Vinicius Scheidegger <vinicius.scheideg...@gmail.com> 于2020年8月7日周五 > 上午6:10写道: > > > Hi Peter, > > > > AFAIK, everything depends on: > > > > 1) How you have configured your topic > > a) number of partitions (here I understand you have 15 partitions) > > b) partition replication configuration (each partition necessarily has > a > > leader - primary responsible to hold the data - and for reads and writes) > > you can configure the topic to have a number of replicas > > 2) How you publish messages to the topic > > a) The publisher is responsible to choose the partition. This can be > done > > consciously (by setting the partition id while sending the message to the > > topic) or unconsciously (by using the DefaultPartitioner or any other > > partitioner scheme). > > > > All messages sent to a specific partition will be written first to the > > leader (meaning that the disk configured for the partition leader will > > receive the load) and then replicated to the replica (followers). > > Kafka does not automatically distribute the data equally to the different > > brokers - you need to think about your architecture having that in mind. > > > > I hope it helps > > > > On Thu, Aug 6, 2020 at 10:23 PM Péter Nagykátai <st4r.f1...@gmail.com> > > wrote: > > > > > I initially started with one data disk (mounted solely to hold Kafka > > data) > > > and recently added a new one. > > > > > > On Thu, Aug 6, 2020 at 10:13 PM <manoj.agraw...@cognizant.com> wrote: > > > > > > > What do you mean older disk ? > > > > > > > > On 8/6/20, 12:05 PM, "Péter Nagykátai" <st4r.f1...@gmail.com> > wrote: > > > > > > > > [External] > > > > > > > > > > > > Yeah, but it doesn't do that. My "older" disks have ~70 > partitions, > > > the > > > > newer ones ~5 partitions. That's why I'm asking what went wrong. > > > > > > > > On Thu, Aug 6, 2020 at 8:35 PM <manoj.agraw...@cognizant.com> > > wrote: > > > > > > > > > Kafka evenly distributed number of partition on each disk so > in > > > > your case > > > > > every disk should have 3/2 topic partitions . > > > > > It is producer job to evenly produce data by partition key to > > > topic > > > > > partition . > > > > > How it partition key , it is auto generated or producer sending > > key > > > > along > > > > > with message . > > > > > > > > > > > > > > > On 8/6/20, 7:29 AM, "Péter Nagykátai" <st4r.f1...@gmail.com> > > > wrote: > > > > > > > > > > [External] > > > > > > > > > > > > > > > Hello, > > > > > > > > > > I have a Kafka cluster with 3 brokers (v2.3.0) and each > > broker > > > > has 2 > > > > > disks > > > > > attached. I added a new topic (heavyweight) and was > surprised > > > > that > > > > > even if > > > > > the topic has 15 partitions, those weren't distributed > evenly > > > on > > > > the > > > > > disks. > > > > > Thus I got one disk that's almost empty and the other > almost > > > > filled > > > > > up. Is > > > > > there any way to have Kafka evenly distribute data on its > > > disks? > > > > > > > > > > Thank you! > > > > > > > > > > > > > > > This e-mail and any files transmitted with it are for the sole > > use > > > > of the > > > > > intended recipient(s) and may contain confidential and > privileged > > > > > information. If you are not the intended recipient(s), please > > reply > > > > to the > > > > > sender and destroy all copies of the original message. Any > > > > unauthorized > > > > > review, use, disclosure, dissemination, forwarding, printing or > > > > copying of > > > > > this email, and/or any action taken in reliance on the contents > > of > > > > this > > > > > e-mail is strictly prohibited and may be unlawful. Where > > permitted > > > by > > > > > applicable law, this e-mail and other e-mail communications > sent > > to > > > > and > > > > > from Cognizant e-mail addresses may be monitored. > > > > > This e-mail and any files transmitted with it are for the sole > > use > > > > of the > > > > > intended recipient(s) and may contain confidential and > privileged > > > > > information. If you are not the intended recipient(s), please > > reply > > > > to the > > > > > sender and destroy all copies of the original message. Any > > > > unauthorized > > > > > review, use, disclosure, dissemination, forwarding, printing or > > > > copying of > > > > > this email, and/or any action taken in reliance on the contents > > of > > > > this > > > > > e-mail is strictly prohibited and may be unlawful. Where > > permitted > > > by > > > > > applicable law, this e-mail and other e-mail communications > sent > > to > > > > and > > > > > from Cognizant e-mail addresses may be monitored. > > > > > > > > > > > > > > > > > This e-mail and any files transmitted with it are for the sole use of > > the > > > > intended recipient(s) and may contain confidential and privileged > > > > information. If you are not the intended recipient(s), please reply > to > > > the > > > > sender and destroy all copies of the original message. Any > unauthorized > > > > review, use, disclosure, dissemination, forwarding, printing or > copying > > > of > > > > this email, and/or any action taken in reliance on the contents of > this > > > > e-mail is strictly prohibited and may be unlawful. Where permitted by > > > > applicable law, this e-mail and other e-mail communications sent to > and > > > > from Cognizant e-mail addresses may be monitored. > > > > This e-mail and any files transmitted with it are for the sole use of > > the > > > > intended recipient(s) and may contain confidential and privileged > > > > information. If you are not the intended recipient(s), please reply > to > > > the > > > > sender and destroy all copies of the original message. Any > unauthorized > > > > review, use, disclosure, dissemination, forwarding, printing or > copying > > > of > > > > this email, and/or any action taken in reliance on the contents of > this > > > > e-mail is strictly prohibited and may be unlawful. Where permitted by > > > > applicable law, this e-mail and other e-mail communications sent to > and > > > > from Cognizant e-mail addresses may be monitored. > > > > > > > > > >