Good luck :) On Thu, Nov 11, 2021 at 12:50 PM David Ballano Fernandez < dfernan...@demonware.net> wrote:
> Thanks Liam, I will try my best but due to some prod shenanigans I won't be > able to test this until next week, i will reply once I have more info > thanks for your help! > > On Wed, Nov 10, 2021 at 3:22 PM Liam Clarke-Hutchinson < > lclar...@redhat.com> > wrote: > > > Hi David, those log messages are INFO level logged to controller.log when > > the cluster starts up and selects a broker to act as a controller, or a > new > > controller is elected. > > > > Reason I'm asking about those log messages is that they reflect the > cached > > state of "alive" brokers that the controller knows about. When a topic is > > created, this cached state is used to assign replicas in a rather > > straightforward (when there's no rack awareness involved) round robin > > fashion across all brokers the controller knows about. > > > > But when you run a replica reassignment, it requires you to explicitly > > identify which broker id a replica should move to, and looking at the > code, > > this forcibly updates the cache of broker metadata for each broker id you > > specify. So I'm wondering if the cached "alive" broker state when you > > initially created the topic doesn't reflect all the actual brokers in > your > > cluster. > > > > So, if you are able to a) set the logging level for > > kafka.controller.KafkaController (at the very least) to INFO and b) stop > > then restart your entire cluster, those logging messages would confirm or > > eliminate the question of that cached broker state being a factor. > > > > Admittedly I could be barking up an entirely wrong tree, and if anyone > who > > understands the replica assignment algorithm better than I is reading, > > please do correct me! > > > > Cheers, > > > > Liam Clarke-Hutchinson > > > > On Thu, 11 Nov 2021, 5:16 am David Ballano Fernandez, < > > dfernan...@demonware.net> wrote: > > > > > Hi Liam, > > > > > > I tried set all loggers to DEBUG on the controller > > > > > > this are the only messages that i can see when i create a topic, > > couldn't > > > find the logs you mention but got this: > > > > > > ==> controller.log <== > > > [2021-11-10 05:06:19,042] INFO [Controller id=103] New topics: > > > [HashSet(davidballano20)], deleted topics: [HashSet()], new partition > > > replica assignment [HashMap(davidballano20-3 -> > > > ReplicaAssignment(replicas=112,111,121, addingReplicas=, > > > removingReplicas=), davidballano20-1 -> > > > ReplicaAssignment(replicas=107,101,116, addingReplicas=, > > > removingReplicas=), davidballano20-2 -> > > > ReplicaAssignment(replicas=113,116,111, addingReplicas=, > > > removingReplicas=), davidballano20-4 -> > > > ReplicaAssignment(replicas=120,121,122, addingReplicas=, > > > removingReplicas=), davidballano20-0 -> > > > ReplicaAssignment(replicas=100,106,101, addingReplicas=, > > > removingReplicas=))] (kafka.controller.KafkaController) > > > [2021-11-10 05:06:19,042] INFO [Controller id=103] New partition > creation > > > callback for > > > > > > > > > davidballano20-3,davidballano20-1,davidballano20-2,davidballano20-4,davidballano20-0 > > > (kafka.controller.KafkaController) > > > ... > > > ... > > > ==> state-change.log <== > > > [2021-11-10 05:06:19,054] INFO [Controller id=103 epoch=11] Sending > > > LeaderAndIsr request to broker 122 with 0 become-leader and 1 > > > become-follower partitions (state.change.logger) > > > [2021-11-10 05:06:19,054] INFO [Controller id=103 epoch=11] Sending > > > UpdateMetadata request to brokers HashSet(100, 101, 102, 103, 104, 105, > > > 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, > > 120, > > > 121, 122, 123) for 5 partitions (state.change.logger) > > > ... > > > ... > > > > > > thanks! > > > > > > On Tue, Nov 9, 2021 at 5:04 PM Liam Clarke-Hutchinson < > > lclar...@redhat.com > > > > > > > wrote: > > > > > > > Sorry forgot to mention they'll usually be under $KAFKA_DIR/logs. > > > > > > > > On Wed, 10 Nov 2021, 1:53 pm Liam Clarke-Hutchinson, < > > > lclar...@redhat.com> > > > > wrote: > > > > > > > > > Thanks :) > > > > > > > > > > If you grep for "broker epochs cache" in the controller.log.* > files, > > > are > > > > > you seeing all of your brokers listed? > > > > > Should see log messages like "Initialized|Updated broker epochs > > cache: > > > > > HashMap(<broker_id> -> epoch, <broker_id_2> -> epoch...)" > > > > > > > > > > This is to check if the controller knows that all of your brokers > are > > > > live > > > > > at the time of topic creation. If their id is in that hashmap, > > they're > > > > > alive. > > > > > > > > > > Cheers, > > > > > > > > > > Liam > > > > > > > > > > On Wed, Nov 10, 2021 at 1:21 PM David Ballano Fernandez < > > > > > dfernan...@demonware.net> wrote: > > > > > > > > > >> We are using Kafka with zookeeper > > > > >> > > > > >> On Tue, Nov 9, 2021 at 4:12 PM Liam Clarke-Hutchinson < > > > > >> lclar...@redhat.com> > > > > >> wrote: > > > > >> > > > > >> > Yeah, it's broker side, just wanted to eliminate the obscure > edge > > > > case. > > > > >> > > > > > >> > Oh, and are you using Zookeeper or KRaft? > > > > >> > > > > > >> > Cheers, > > > > >> > > > > > >> > Liam > > > > >> > > > > > >> > On Wed, Nov 10, 2021 at 1:00 PM David Ballano Fernandez < > > > > >> > dfernan...@demonware.net> wrote: > > > > >> > > > > > >> > > I don't seem to have that config in any of our clusters. Is > that > > > > >> broker > > > > >> > > config? > > > > >> > > > > > > >> > > > > > > >> > > On Tue, Nov 9, 2021 at 3:50 PM Liam Clarke-Hutchinson < > > > > >> > lclar...@redhat.com > > > > >> > > > > > > > >> > > wrote: > > > > >> > > > > > > >> > > > Thanks David, > > > > >> > > > > > > > >> > > > Hmm, is the property create.topic.policy.class.name set in > > > > >> > > > server.properties at all? > > > > >> > > > > > > > >> > > > Cheers, > > > > >> > > > > > > > >> > > > Liam > > > > >> > > > > > > > >> > > > On Wed, Nov 10, 2021 at 12:21 PM David Ballano Fernandez < > > > > >> > > > dfernan...@demonware.net> wrote: > > > > >> > > > > > > > >> > > > > Hi Liam, > > > > >> > > > > > > > > >> > > > > I did a test creating topics with kafka-topics.sh and > admin > > > API > > > > >> from > > > > >> > > > > confluent kafka python. > > > > >> > > > > The same happened for both. > > > > >> > > > > > > > > >> > > > > thanks! > > > > >> > > > > > > > > >> > > > > On Tue, Nov 9, 2021 at 2:58 PM Liam Clarke-Hutchinson < > > > > >> > > > lclar...@redhat.com > > > > >> > > > > > > > > > >> > > > > wrote: > > > > >> > > > > > > > > >> > > > > > Hi David, > > > > >> > > > > > > > > > >> > > > > > What tool(s) are you using to create new topics? Is it > the > > > > >> > > > > kafka-topics.sh > > > > >> > > > > > that ships with Apache Kafka? > > > > >> > > > > > > > > > >> > > > > > Cheers, > > > > >> > > > > > > > > > >> > > > > > Liam Clarke-Hutchinson > > > > >> > > > > > > > > > >> > > > > > On Wed, Nov 10, 2021 at 11:41 AM David Ballano > Fernandez < > > > > >> > > > > > dfernan...@demonware.net> wrote: > > > > >> > > > > > > > > > >> > > > > > > Hi All, > > > > >> > > > > > > Trying to figure out why my brokers have some disk > > > > imbalance I > > > > >> > have > > > > >> > > > > found > > > > >> > > > > > > that Kafka (maybe this is the way it is supposed to > > work?) > > > > is > > > > >> not > > > > >> > > > > > spreading > > > > >> > > > > > > all replicas to all available brokers. > > > > >> > > > > > > > > > > >> > > > > > > I have been trying to figure out how a topic with 5 > > > > partitions > > > > >> > with > > > > >> > > > > > > replication_factor=3 (15 replicas) could endup having > > all > > > > >> > replicas > > > > >> > > > > > spread > > > > >> > > > > > > over 9 brokers instead of 15, especially when there > are > > > more > > > > >> > > brokers > > > > >> > > > > than > > > > >> > > > > > > the total replicas for that specific topic. > > > > >> > > > > > > > > > > >> > > > > > > cluster has 48 brokers. > > > > >> > > > > > > > > > > >> > > > > > > # topics.py describe -topic topic1 > > > > >> > > > > > > {145: 1, 148: 2, *101: 3*, 146: 1, 102: 2, 147: 1, > 103: > > 2, > > > > >> 104: > > > > >> > 2, > > > > >> > > > 105: > > > > >> > > > > > 1} > > > > >> > > > > > > the keys are the brokerid and the values is how many > > > > replicas > > > > >> > they > > > > >> > > > > have. > > > > >> > > > > > > > > > > >> > > > > > > As you can see brokerid 101 has 3 replicas. which make > > the > > > > >> disk > > > > >> > > > > > unbalanced > > > > >> > > > > > > compared to other brokers. > > > > >> > > > > > > > > > > >> > > > > > > I created a brand new topic in a test cluster with 24 > > > > brokers. > > > > >> > > topic > > > > >> > > > > has > > > > >> > > > > > 5 > > > > >> > > > > > > partitions with replication factor 3 > > > > >> > > > > > > topics.py describe -topic test > > > > >> > > > > > > {119: 1, 103: 1, 106: 2, 109: 1, 101: 2, 114: 1, 116: > 2, > > > > 118: > > > > >> 1, > > > > >> > > 111: > > > > >> > > > > 2, > > > > >> > > > > > > 104: 1, 121: 1} > > > > >> > > > > > > > > > > >> > > > > > > This time kafka decided to spread the replicas over 11 > > > > brokers > > > > >> > > > instead > > > > >> > > > > of > > > > >> > > > > > > 15. > > > > >> > > > > > > just for fun i ran a partition reassignment for > topic > > > > test, > > > > >> > > > spreading > > > > >> > > > > > all > > > > >> > > > > > > replicas to all brokers, result: > > > > >> > > > > > > > > > > >> > > > > > > # topics.py describe -topic test > > > > >> > > > > > > {110: 1, 111: 1, 109: 1, 108: 1, 112: 1, 103: 1, 107: > 1, > > > > 105: > > > > >> 1, > > > > >> > > 104: > > > > >> > > > > 1, > > > > >> > > > > > > 106: 1, 102: 1, 118: 1, 116: 1, 113: 1, 117: 1} > > > > >> > > > > > > > > > > >> > > > > > > Now all replicas are spread across 15 brokers. > > > > >> > > > > > > > > > > >> > > > > > > Is there something I am missing? Maybe the reason is > to > > > keep > > > > >> > > network > > > > >> > > > > > > chatter down?. By the way, I don't have any rack > > awareness > > > > >> > > > configured. > > > > >> > > > > > > Thanks! > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > > > > > > > > > > > >