Thanks Liam, I will try my best but due to some prod shenanigans I won't be able to test this until next week, i will reply once I have more info thanks for your help!
On Wed, Nov 10, 2021 at 3:22 PM Liam Clarke-Hutchinson <lclar...@redhat.com> wrote: > Hi David, those log messages are INFO level logged to controller.log when > the cluster starts up and selects a broker to act as a controller, or a new > controller is elected. > > Reason I'm asking about those log messages is that they reflect the cached > state of "alive" brokers that the controller knows about. When a topic is > created, this cached state is used to assign replicas in a rather > straightforward (when there's no rack awareness involved) round robin > fashion across all brokers the controller knows about. > > But when you run a replica reassignment, it requires you to explicitly > identify which broker id a replica should move to, and looking at the code, > this forcibly updates the cache of broker metadata for each broker id you > specify. So I'm wondering if the cached "alive" broker state when you > initially created the topic doesn't reflect all the actual brokers in your > cluster. > > So, if you are able to a) set the logging level for > kafka.controller.KafkaController (at the very least) to INFO and b) stop > then restart your entire cluster, those logging messages would confirm or > eliminate the question of that cached broker state being a factor. > > Admittedly I could be barking up an entirely wrong tree, and if anyone who > understands the replica assignment algorithm better than I is reading, > please do correct me! > > Cheers, > > Liam Clarke-Hutchinson > > On Thu, 11 Nov 2021, 5:16 am David Ballano Fernandez, < > dfernan...@demonware.net> wrote: > > > Hi Liam, > > > > I tried set all loggers to DEBUG on the controller > > > > this are the only messages that i can see when i create a topic, > couldn't > > find the logs you mention but got this: > > > > ==> controller.log <== > > [2021-11-10 05:06:19,042] INFO [Controller id=103] New topics: > > [HashSet(davidballano20)], deleted topics: [HashSet()], new partition > > replica assignment [HashMap(davidballano20-3 -> > > ReplicaAssignment(replicas=112,111,121, addingReplicas=, > > removingReplicas=), davidballano20-1 -> > > ReplicaAssignment(replicas=107,101,116, addingReplicas=, > > removingReplicas=), davidballano20-2 -> > > ReplicaAssignment(replicas=113,116,111, addingReplicas=, > > removingReplicas=), davidballano20-4 -> > > ReplicaAssignment(replicas=120,121,122, addingReplicas=, > > removingReplicas=), davidballano20-0 -> > > ReplicaAssignment(replicas=100,106,101, addingReplicas=, > > removingReplicas=))] (kafka.controller.KafkaController) > > [2021-11-10 05:06:19,042] INFO [Controller id=103] New partition creation > > callback for > > > > > davidballano20-3,davidballano20-1,davidballano20-2,davidballano20-4,davidballano20-0 > > (kafka.controller.KafkaController) > > ... > > ... > > ==> state-change.log <== > > [2021-11-10 05:06:19,054] INFO [Controller id=103 epoch=11] Sending > > LeaderAndIsr request to broker 122 with 0 become-leader and 1 > > become-follower partitions (state.change.logger) > > [2021-11-10 05:06:19,054] INFO [Controller id=103 epoch=11] Sending > > UpdateMetadata request to brokers HashSet(100, 101, 102, 103, 104, 105, > > 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, > 120, > > 121, 122, 123) for 5 partitions (state.change.logger) > > ... > > ... > > > > thanks! > > > > On Tue, Nov 9, 2021 at 5:04 PM Liam Clarke-Hutchinson < > lclar...@redhat.com > > > > > wrote: > > > > > Sorry forgot to mention they'll usually be under $KAFKA_DIR/logs. > > > > > > On Wed, 10 Nov 2021, 1:53 pm Liam Clarke-Hutchinson, < > > lclar...@redhat.com> > > > wrote: > > > > > > > Thanks :) > > > > > > > > If you grep for "broker epochs cache" in the controller.log.* files, > > are > > > > you seeing all of your brokers listed? > > > > Should see log messages like "Initialized|Updated broker epochs > cache: > > > > HashMap(<broker_id> -> epoch, <broker_id_2> -> epoch...)" > > > > > > > > This is to check if the controller knows that all of your brokers are > > > live > > > > at the time of topic creation. If their id is in that hashmap, > they're > > > > alive. > > > > > > > > Cheers, > > > > > > > > Liam > > > > > > > > On Wed, Nov 10, 2021 at 1:21 PM David Ballano Fernandez < > > > > dfernan...@demonware.net> wrote: > > > > > > > >> We are using Kafka with zookeeper > > > >> > > > >> On Tue, Nov 9, 2021 at 4:12 PM Liam Clarke-Hutchinson < > > > >> lclar...@redhat.com> > > > >> wrote: > > > >> > > > >> > Yeah, it's broker side, just wanted to eliminate the obscure edge > > > case. > > > >> > > > > >> > Oh, and are you using Zookeeper or KRaft? > > > >> > > > > >> > Cheers, > > > >> > > > > >> > Liam > > > >> > > > > >> > On Wed, Nov 10, 2021 at 1:00 PM David Ballano Fernandez < > > > >> > dfernan...@demonware.net> wrote: > > > >> > > > > >> > > I don't seem to have that config in any of our clusters. Is that > > > >> broker > > > >> > > config? > > > >> > > > > > >> > > > > > >> > > On Tue, Nov 9, 2021 at 3:50 PM Liam Clarke-Hutchinson < > > > >> > lclar...@redhat.com > > > >> > > > > > > >> > > wrote: > > > >> > > > > > >> > > > Thanks David, > > > >> > > > > > > >> > > > Hmm, is the property create.topic.policy.class.name set in > > > >> > > > server.properties at all? > > > >> > > > > > > >> > > > Cheers, > > > >> > > > > > > >> > > > Liam > > > >> > > > > > > >> > > > On Wed, Nov 10, 2021 at 12:21 PM David Ballano Fernandez < > > > >> > > > dfernan...@demonware.net> wrote: > > > >> > > > > > > >> > > > > Hi Liam, > > > >> > > > > > > > >> > > > > I did a test creating topics with kafka-topics.sh and admin > > API > > > >> from > > > >> > > > > confluent kafka python. > > > >> > > > > The same happened for both. > > > >> > > > > > > > >> > > > > thanks! > > > >> > > > > > > > >> > > > > On Tue, Nov 9, 2021 at 2:58 PM Liam Clarke-Hutchinson < > > > >> > > > lclar...@redhat.com > > > >> > > > > > > > > >> > > > > wrote: > > > >> > > > > > > > >> > > > > > Hi David, > > > >> > > > > > > > > >> > > > > > What tool(s) are you using to create new topics? Is it the > > > >> > > > > kafka-topics.sh > > > >> > > > > > that ships with Apache Kafka? > > > >> > > > > > > > > >> > > > > > Cheers, > > > >> > > > > > > > > >> > > > > > Liam Clarke-Hutchinson > > > >> > > > > > > > > >> > > > > > On Wed, Nov 10, 2021 at 11:41 AM David Ballano Fernandez < > > > >> > > > > > dfernan...@demonware.net> wrote: > > > >> > > > > > > > > >> > > > > > > Hi All, > > > >> > > > > > > Trying to figure out why my brokers have some disk > > > imbalance I > > > >> > have > > > >> > > > > found > > > >> > > > > > > that Kafka (maybe this is the way it is supposed to > work?) > > > is > > > >> not > > > >> > > > > > spreading > > > >> > > > > > > all replicas to all available brokers. > > > >> > > > > > > > > > >> > > > > > > I have been trying to figure out how a topic with 5 > > > partitions > > > >> > with > > > >> > > > > > > replication_factor=3 (15 replicas) could endup having > all > > > >> > replicas > > > >> > > > > > spread > > > >> > > > > > > over 9 brokers instead of 15, especially when there are > > more > > > >> > > brokers > > > >> > > > > than > > > >> > > > > > > the total replicas for that specific topic. > > > >> > > > > > > > > > >> > > > > > > cluster has 48 brokers. > > > >> > > > > > > > > > >> > > > > > > # topics.py describe -topic topic1 > > > >> > > > > > > {145: 1, 148: 2, *101: 3*, 146: 1, 102: 2, 147: 1, 103: > 2, > > > >> 104: > > > >> > 2, > > > >> > > > 105: > > > >> > > > > > 1} > > > >> > > > > > > the keys are the brokerid and the values is how many > > > replicas > > > >> > they > > > >> > > > > have. > > > >> > > > > > > > > > >> > > > > > > As you can see brokerid 101 has 3 replicas. which make > the > > > >> disk > > > >> > > > > > unbalanced > > > >> > > > > > > compared to other brokers. > > > >> > > > > > > > > > >> > > > > > > I created a brand new topic in a test cluster with 24 > > > brokers. > > > >> > > topic > > > >> > > > > has > > > >> > > > > > 5 > > > >> > > > > > > partitions with replication factor 3 > > > >> > > > > > > topics.py describe -topic test > > > >> > > > > > > {119: 1, 103: 1, 106: 2, 109: 1, 101: 2, 114: 1, 116: 2, > > > 118: > > > >> 1, > > > >> > > 111: > > > >> > > > > 2, > > > >> > > > > > > 104: 1, 121: 1} > > > >> > > > > > > > > > >> > > > > > > This time kafka decided to spread the replicas over 11 > > > brokers > > > >> > > > instead > > > >> > > > > of > > > >> > > > > > > 15. > > > >> > > > > > > just for fun i ran a partition reassignment for topic > > > test, > > > >> > > > spreading > > > >> > > > > > all > > > >> > > > > > > replicas to all brokers, result: > > > >> > > > > > > > > > >> > > > > > > # topics.py describe -topic test > > > >> > > > > > > {110: 1, 111: 1, 109: 1, 108: 1, 112: 1, 103: 1, 107: 1, > > > 105: > > > >> 1, > > > >> > > 104: > > > >> > > > > 1, > > > >> > > > > > > 106: 1, 102: 1, 118: 1, 116: 1, 113: 1, 117: 1} > > > >> > > > > > > > > > >> > > > > > > Now all replicas are spread across 15 brokers. > > > >> > > > > > > > > > >> > > > > > > Is there something I am missing? Maybe the reason is to > > keep > > > >> > > network > > > >> > > > > > > chatter down?. By the way, I don't have any rack > awareness > > > >> > > > configured. > > > >> > > > > > > Thanks! > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > > > > > > > >