Thanks Liam, I will try my best but due to some prod shenanigans I won't be
able to test this until next week, i will reply once I have more info
thanks for your help!

On Wed, Nov 10, 2021 at 3:22 PM Liam Clarke-Hutchinson <lclar...@redhat.com>
wrote:

> Hi David, those log messages are INFO level logged to controller.log when
> the cluster starts up and selects a broker to act as a controller, or a new
> controller is elected.
>
> Reason I'm asking about those log messages is that they reflect the cached
> state of "alive" brokers that the controller knows about. When a topic is
> created, this cached state is used to assign replicas in a rather
> straightforward (when there's no rack awareness involved) round robin
> fashion across all brokers the controller knows about.
>
> But when you run a replica reassignment, it requires you to explicitly
> identify which broker id a replica should move to, and looking at the code,
> this forcibly updates the cache of broker metadata for each broker id you
> specify.  So I'm wondering if the cached "alive" broker state when you
> initially created the topic doesn't reflect all the actual brokers in your
> cluster.
>
> So, if you are able to a) set the logging level for
> kafka.controller.KafkaController (at the very least) to INFO and b) stop
> then restart your entire cluster, those logging messages would confirm or
> eliminate the question of that cached broker state being a factor.
>
> Admittedly I could be barking up an entirely wrong tree, and if anyone who
> understands the replica assignment algorithm better than I is reading,
> please do correct me!
>
> Cheers,
>
> Liam Clarke-Hutchinson
>
> On Thu, 11 Nov 2021, 5:16 am David Ballano Fernandez, <
> dfernan...@demonware.net> wrote:
>
> > Hi Liam,
> >
> > I tried set all loggers to  DEBUG on the controller
> >
> > this are the only messages that i can see when i create a topic,
> couldn't
> > find the logs you mention but got this:
> >
> > ==> controller.log <==
> > [2021-11-10 05:06:19,042] INFO [Controller id=103] New topics:
> > [HashSet(davidballano20)], deleted topics: [HashSet()], new partition
> > replica assignment [HashMap(davidballano20-3 ->
> > ReplicaAssignment(replicas=112,111,121, addingReplicas=,
> > removingReplicas=), davidballano20-1 ->
> > ReplicaAssignment(replicas=107,101,116, addingReplicas=,
> > removingReplicas=), davidballano20-2 ->
> > ReplicaAssignment(replicas=113,116,111, addingReplicas=,
> > removingReplicas=), davidballano20-4 ->
> > ReplicaAssignment(replicas=120,121,122, addingReplicas=,
> > removingReplicas=), davidballano20-0 ->
> > ReplicaAssignment(replicas=100,106,101, addingReplicas=,
> > removingReplicas=))] (kafka.controller.KafkaController)
> > [2021-11-10 05:06:19,042] INFO [Controller id=103] New partition creation
> > callback for
> >
> >
> davidballano20-3,davidballano20-1,davidballano20-2,davidballano20-4,davidballano20-0
> > (kafka.controller.KafkaController)
> > ...
> > ...
> > ==> state-change.log <==
> > [2021-11-10 05:06:19,054] INFO [Controller id=103 epoch=11] Sending
> > LeaderAndIsr request to broker 122 with 0 become-leader and 1
> > become-follower partitions (state.change.logger)
> > [2021-11-10 05:06:19,054] INFO [Controller id=103 epoch=11] Sending
> > UpdateMetadata request to brokers HashSet(100, 101, 102, 103, 104, 105,
> > 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119,
> 120,
> > 121, 122, 123) for 5 partitions (state.change.logger)
> > ...
> > ...
> >
> > thanks!
> >
> > On Tue, Nov 9, 2021 at 5:04 PM Liam Clarke-Hutchinson <
> lclar...@redhat.com
> > >
> > wrote:
> >
> > > Sorry forgot to mention they'll usually be under $KAFKA_DIR/logs.
> > >
> > > On Wed, 10 Nov 2021, 1:53 pm Liam Clarke-Hutchinson, <
> > lclar...@redhat.com>
> > > wrote:
> > >
> > > > Thanks :)
> > > >
> > > > If you grep for "broker epochs cache" in the controller.log.* files,
> > are
> > > > you seeing all of your brokers listed?
> > > > Should see log messages like "Initialized|Updated broker epochs
> cache:
> > > > HashMap(<broker_id> -> epoch, <broker_id_2> -> epoch...)"
> > > >
> > > > This is to check if the controller knows that all of your brokers are
> > > live
> > > > at the time of topic creation. If their id is in that hashmap,
> they're
> > > > alive.
> > > >
> > > > Cheers,
> > > >
> > > > Liam
> > > >
> > > > On Wed, Nov 10, 2021 at 1:21 PM David Ballano Fernandez <
> > > > dfernan...@demonware.net> wrote:
> > > >
> > > >> We are using Kafka with zookeeper
> > > >>
> > > >> On Tue, Nov 9, 2021 at 4:12 PM Liam Clarke-Hutchinson <
> > > >> lclar...@redhat.com>
> > > >> wrote:
> > > >>
> > > >> > Yeah, it's broker side, just wanted to eliminate the obscure edge
> > > case.
> > > >> >
> > > >> > Oh, and are you using Zookeeper or KRaft?
> > > >> >
> > > >> > Cheers,
> > > >> >
> > > >> > Liam
> > > >> >
> > > >> > On Wed, Nov 10, 2021 at 1:00 PM David Ballano Fernandez <
> > > >> > dfernan...@demonware.net> wrote:
> > > >> >
> > > >> > > I don't seem to have that config in any of our clusters. Is that
> > > >> broker
> > > >> > > config?
> > > >> > >
> > > >> > >
> > > >> > > On Tue, Nov 9, 2021 at 3:50 PM Liam Clarke-Hutchinson <
> > > >> > lclar...@redhat.com
> > > >> > > >
> > > >> > > wrote:
> > > >> > >
> > > >> > > > Thanks David,
> > > >> > > >
> > > >> > > > Hmm, is the property create.topic.policy.class.name set in
> > > >> > > > server.properties at all?
> > > >> > > >
> > > >> > > > Cheers,
> > > >> > > >
> > > >> > > > Liam
> > > >> > > >
> > > >> > > > On Wed, Nov 10, 2021 at 12:21 PM David Ballano Fernandez <
> > > >> > > > dfernan...@demonware.net> wrote:
> > > >> > > >
> > > >> > > > > Hi Liam,
> > > >> > > > >
> > > >> > > > > I did a test creating topics with kafka-topics.sh and admin
> > API
> > > >> from
> > > >> > > > > confluent kafka python.
> > > >> > > > > The same happened for both.
> > > >> > > > >
> > > >> > > > > thanks!
> > > >> > > > >
> > > >> > > > > On Tue, Nov 9, 2021 at 2:58 PM Liam Clarke-Hutchinson <
> > > >> > > > lclar...@redhat.com
> > > >> > > > > >
> > > >> > > > > wrote:
> > > >> > > > >
> > > >> > > > > > Hi David,
> > > >> > > > > >
> > > >> > > > > > What tool(s) are you using to create new topics? Is it the
> > > >> > > > > kafka-topics.sh
> > > >> > > > > > that ships with Apache Kafka?
> > > >> > > > > >
> > > >> > > > > > Cheers,
> > > >> > > > > >
> > > >> > > > > > Liam Clarke-Hutchinson
> > > >> > > > > >
> > > >> > > > > > On Wed, Nov 10, 2021 at 11:41 AM David Ballano Fernandez <
> > > >> > > > > > dfernan...@demonware.net> wrote:
> > > >> > > > > >
> > > >> > > > > > > Hi All,
> > > >> > > > > > > Trying to figure out why my brokers have some disk
> > > imbalance I
> > > >> > have
> > > >> > > > > found
> > > >> > > > > > > that Kafka (maybe this is the way it is supposed to
> work?)
> > > is
> > > >> not
> > > >> > > > > > spreading
> > > >> > > > > > > all replicas to all available brokers.
> > > >> > > > > > >
> > > >> > > > > > > I have been trying to figure out how a topic with 5
> > > partitions
> > > >> > with
> > > >> > > > > > > replication_factor=3  (15 replicas) could endup having
> all
> > > >> > replicas
> > > >> > > > > > spread
> > > >> > > > > > > over 9 brokers instead of 15, especially when there are
> > more
> > > >> > > brokers
> > > >> > > > > than
> > > >> > > > > > > the total replicas for that specific topic.
> > > >> > > > > > >
> > > >> > > > > > > cluster has 48 brokers.
> > > >> > > > > > >
> > > >> > > > > > > # topics.py describe -topic topic1
> > > >> > > > > > > {145: 1, 148: 2, *101: 3*, 146: 1, 102: 2, 147: 1, 103:
> 2,
> > > >> 104:
> > > >> > 2,
> > > >> > > > 105:
> > > >> > > > > > 1}
> > > >> > > > > > > the keys are the brokerid and the values is how many
> > > replicas
> > > >> > they
> > > >> > > > > have.
> > > >> > > > > > >
> > > >> > > > > > > As you can see brokerid 101 has 3 replicas. which make
> the
> > > >> disk
> > > >> > > > > > unbalanced
> > > >> > > > > > > compared to other brokers.
> > > >> > > > > > >
> > > >> > > > > > > I created a brand new topic in a test cluster with 24
> > > brokers.
> > > >> > > topic
> > > >> > > > > has
> > > >> > > > > > 5
> > > >> > > > > > > partitions with replication factor 3
> > > >> > > > > > > topics.py describe -topic test
> > > >> > > > > > > {119: 1, 103: 1, 106: 2, 109: 1, 101: 2, 114: 1, 116: 2,
> > > 118:
> > > >> 1,
> > > >> > > 111:
> > > >> > > > > 2,
> > > >> > > > > > > 104: 1, 121: 1}
> > > >> > > > > > >
> > > >> > > > > > > This time kafka decided to spread the replicas over 11
> > > brokers
> > > >> > > > instead
> > > >> > > > > of
> > > >> > > > > > > 15.
> > > >> > > > > > > just for fun i ran a partition reassignment  for  topic
> > > test,
> > > >> > > > spreading
> > > >> > > > > > all
> > > >> > > > > > > replicas to all brokers, result:
> > > >> > > > > > >
> > > >> > > > > > > # topics.py describe -topic test
> > > >> > > > > > > {110: 1, 111: 1, 109: 1, 108: 1, 112: 1, 103: 1, 107: 1,
> > > 105:
> > > >> 1,
> > > >> > > 104:
> > > >> > > > > 1,
> > > >> > > > > > > 106: 1, 102: 1, 118: 1, 116: 1, 113: 1, 117: 1}
> > > >> > > > > > >
> > > >> > > > > > > Now all replicas are spread across 15 brokers.
> > > >> > > > > > >
> > > >> > > > > > > Is there something I am missing? Maybe the reason is to
> > keep
> > > >> > > network
> > > >> > > > > > > chatter down?. By the way, I don't have any rack
> awareness
> > > >> > > > configured.
> > > >> > > > > > > Thanks!
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > > >
> > >
> >
>

Reply via email to