Good luck :)

On Thu, Nov 11, 2021 at 12:50 PM David Ballano Fernandez <
dfernan...@demonware.net> wrote:

> Thanks Liam, I will try my best but due to some prod shenanigans I won't be
> able to test this until next week, i will reply once I have more info
> thanks for your help!
>
> On Wed, Nov 10, 2021 at 3:22 PM Liam Clarke-Hutchinson <
> lclar...@redhat.com>
> wrote:
>
> > Hi David, those log messages are INFO level logged to controller.log when
> > the cluster starts up and selects a broker to act as a controller, or a
> new
> > controller is elected.
> >
> > Reason I'm asking about those log messages is that they reflect the
> cached
> > state of "alive" brokers that the controller knows about. When a topic is
> > created, this cached state is used to assign replicas in a rather
> > straightforward (when there's no rack awareness involved) round robin
> > fashion across all brokers the controller knows about.
> >
> > But when you run a replica reassignment, it requires you to explicitly
> > identify which broker id a replica should move to, and looking at the
> code,
> > this forcibly updates the cache of broker metadata for each broker id you
> > specify.  So I'm wondering if the cached "alive" broker state when you
> > initially created the topic doesn't reflect all the actual brokers in
> your
> > cluster.
> >
> > So, if you are able to a) set the logging level for
> > kafka.controller.KafkaController (at the very least) to INFO and b) stop
> > then restart your entire cluster, those logging messages would confirm or
> > eliminate the question of that cached broker state being a factor.
> >
> > Admittedly I could be barking up an entirely wrong tree, and if anyone
> who
> > understands the replica assignment algorithm better than I is reading,
> > please do correct me!
> >
> > Cheers,
> >
> > Liam Clarke-Hutchinson
> >
> > On Thu, 11 Nov 2021, 5:16 am David Ballano Fernandez, <
> > dfernan...@demonware.net> wrote:
> >
> > > Hi Liam,
> > >
> > > I tried set all loggers to  DEBUG on the controller
> > >
> > > this are the only messages that i can see when i create a topic,
> > couldn't
> > > find the logs you mention but got this:
> > >
> > > ==> controller.log <==
> > > [2021-11-10 05:06:19,042] INFO [Controller id=103] New topics:
> > > [HashSet(davidballano20)], deleted topics: [HashSet()], new partition
> > > replica assignment [HashMap(davidballano20-3 ->
> > > ReplicaAssignment(replicas=112,111,121, addingReplicas=,
> > > removingReplicas=), davidballano20-1 ->
> > > ReplicaAssignment(replicas=107,101,116, addingReplicas=,
> > > removingReplicas=), davidballano20-2 ->
> > > ReplicaAssignment(replicas=113,116,111, addingReplicas=,
> > > removingReplicas=), davidballano20-4 ->
> > > ReplicaAssignment(replicas=120,121,122, addingReplicas=,
> > > removingReplicas=), davidballano20-0 ->
> > > ReplicaAssignment(replicas=100,106,101, addingReplicas=,
> > > removingReplicas=))] (kafka.controller.KafkaController)
> > > [2021-11-10 05:06:19,042] INFO [Controller id=103] New partition
> creation
> > > callback for
> > >
> > >
> >
> davidballano20-3,davidballano20-1,davidballano20-2,davidballano20-4,davidballano20-0
> > > (kafka.controller.KafkaController)
> > > ...
> > > ...
> > > ==> state-change.log <==
> > > [2021-11-10 05:06:19,054] INFO [Controller id=103 epoch=11] Sending
> > > LeaderAndIsr request to broker 122 with 0 become-leader and 1
> > > become-follower partitions (state.change.logger)
> > > [2021-11-10 05:06:19,054] INFO [Controller id=103 epoch=11] Sending
> > > UpdateMetadata request to brokers HashSet(100, 101, 102, 103, 104, 105,
> > > 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119,
> > 120,
> > > 121, 122, 123) for 5 partitions (state.change.logger)
> > > ...
> > > ...
> > >
> > > thanks!
> > >
> > > On Tue, Nov 9, 2021 at 5:04 PM Liam Clarke-Hutchinson <
> > lclar...@redhat.com
> > > >
> > > wrote:
> > >
> > > > Sorry forgot to mention they'll usually be under $KAFKA_DIR/logs.
> > > >
> > > > On Wed, 10 Nov 2021, 1:53 pm Liam Clarke-Hutchinson, <
> > > lclar...@redhat.com>
> > > > wrote:
> > > >
> > > > > Thanks :)
> > > > >
> > > > > If you grep for "broker epochs cache" in the controller.log.*
> files,
> > > are
> > > > > you seeing all of your brokers listed?
> > > > > Should see log messages like "Initialized|Updated broker epochs
> > cache:
> > > > > HashMap(<broker_id> -> epoch, <broker_id_2> -> epoch...)"
> > > > >
> > > > > This is to check if the controller knows that all of your brokers
> are
> > > > live
> > > > > at the time of topic creation. If their id is in that hashmap,
> > they're
> > > > > alive.
> > > > >
> > > > > Cheers,
> > > > >
> > > > > Liam
> > > > >
> > > > > On Wed, Nov 10, 2021 at 1:21 PM David Ballano Fernandez <
> > > > > dfernan...@demonware.net> wrote:
> > > > >
> > > > >> We are using Kafka with zookeeper
> > > > >>
> > > > >> On Tue, Nov 9, 2021 at 4:12 PM Liam Clarke-Hutchinson <
> > > > >> lclar...@redhat.com>
> > > > >> wrote:
> > > > >>
> > > > >> > Yeah, it's broker side, just wanted to eliminate the obscure
> edge
> > > > case.
> > > > >> >
> > > > >> > Oh, and are you using Zookeeper or KRaft?
> > > > >> >
> > > > >> > Cheers,
> > > > >> >
> > > > >> > Liam
> > > > >> >
> > > > >> > On Wed, Nov 10, 2021 at 1:00 PM David Ballano Fernandez <
> > > > >> > dfernan...@demonware.net> wrote:
> > > > >> >
> > > > >> > > I don't seem to have that config in any of our clusters. Is
> that
> > > > >> broker
> > > > >> > > config?
> > > > >> > >
> > > > >> > >
> > > > >> > > On Tue, Nov 9, 2021 at 3:50 PM Liam Clarke-Hutchinson <
> > > > >> > lclar...@redhat.com
> > > > >> > > >
> > > > >> > > wrote:
> > > > >> > >
> > > > >> > > > Thanks David,
> > > > >> > > >
> > > > >> > > > Hmm, is the property create.topic.policy.class.name set in
> > > > >> > > > server.properties at all?
> > > > >> > > >
> > > > >> > > > Cheers,
> > > > >> > > >
> > > > >> > > > Liam
> > > > >> > > >
> > > > >> > > > On Wed, Nov 10, 2021 at 12:21 PM David Ballano Fernandez <
> > > > >> > > > dfernan...@demonware.net> wrote:
> > > > >> > > >
> > > > >> > > > > Hi Liam,
> > > > >> > > > >
> > > > >> > > > > I did a test creating topics with kafka-topics.sh and
> admin
> > > API
> > > > >> from
> > > > >> > > > > confluent kafka python.
> > > > >> > > > > The same happened for both.
> > > > >> > > > >
> > > > >> > > > > thanks!
> > > > >> > > > >
> > > > >> > > > > On Tue, Nov 9, 2021 at 2:58 PM Liam Clarke-Hutchinson <
> > > > >> > > > lclar...@redhat.com
> > > > >> > > > > >
> > > > >> > > > > wrote:
> > > > >> > > > >
> > > > >> > > > > > Hi David,
> > > > >> > > > > >
> > > > >> > > > > > What tool(s) are you using to create new topics? Is it
> the
> > > > >> > > > > kafka-topics.sh
> > > > >> > > > > > that ships with Apache Kafka?
> > > > >> > > > > >
> > > > >> > > > > > Cheers,
> > > > >> > > > > >
> > > > >> > > > > > Liam Clarke-Hutchinson
> > > > >> > > > > >
> > > > >> > > > > > On Wed, Nov 10, 2021 at 11:41 AM David Ballano
> Fernandez <
> > > > >> > > > > > dfernan...@demonware.net> wrote:
> > > > >> > > > > >
> > > > >> > > > > > > Hi All,
> > > > >> > > > > > > Trying to figure out why my brokers have some disk
> > > > imbalance I
> > > > >> > have
> > > > >> > > > > found
> > > > >> > > > > > > that Kafka (maybe this is the way it is supposed to
> > work?)
> > > > is
> > > > >> not
> > > > >> > > > > > spreading
> > > > >> > > > > > > all replicas to all available brokers.
> > > > >> > > > > > >
> > > > >> > > > > > > I have been trying to figure out how a topic with 5
> > > > partitions
> > > > >> > with
> > > > >> > > > > > > replication_factor=3  (15 replicas) could endup having
> > all
> > > > >> > replicas
> > > > >> > > > > > spread
> > > > >> > > > > > > over 9 brokers instead of 15, especially when there
> are
> > > more
> > > > >> > > brokers
> > > > >> > > > > than
> > > > >> > > > > > > the total replicas for that specific topic.
> > > > >> > > > > > >
> > > > >> > > > > > > cluster has 48 brokers.
> > > > >> > > > > > >
> > > > >> > > > > > > # topics.py describe -topic topic1
> > > > >> > > > > > > {145: 1, 148: 2, *101: 3*, 146: 1, 102: 2, 147: 1,
> 103:
> > 2,
> > > > >> 104:
> > > > >> > 2,
> > > > >> > > > 105:
> > > > >> > > > > > 1}
> > > > >> > > > > > > the keys are the brokerid and the values is how many
> > > > replicas
> > > > >> > they
> > > > >> > > > > have.
> > > > >> > > > > > >
> > > > >> > > > > > > As you can see brokerid 101 has 3 replicas. which make
> > the
> > > > >> disk
> > > > >> > > > > > unbalanced
> > > > >> > > > > > > compared to other brokers.
> > > > >> > > > > > >
> > > > >> > > > > > > I created a brand new topic in a test cluster with 24
> > > > brokers.
> > > > >> > > topic
> > > > >> > > > > has
> > > > >> > > > > > 5
> > > > >> > > > > > > partitions with replication factor 3
> > > > >> > > > > > > topics.py describe -topic test
> > > > >> > > > > > > {119: 1, 103: 1, 106: 2, 109: 1, 101: 2, 114: 1, 116:
> 2,
> > > > 118:
> > > > >> 1,
> > > > >> > > 111:
> > > > >> > > > > 2,
> > > > >> > > > > > > 104: 1, 121: 1}
> > > > >> > > > > > >
> > > > >> > > > > > > This time kafka decided to spread the replicas over 11
> > > > brokers
> > > > >> > > > instead
> > > > >> > > > > of
> > > > >> > > > > > > 15.
> > > > >> > > > > > > just for fun i ran a partition reassignment  for
> topic
> > > > test,
> > > > >> > > > spreading
> > > > >> > > > > > all
> > > > >> > > > > > > replicas to all brokers, result:
> > > > >> > > > > > >
> > > > >> > > > > > > # topics.py describe -topic test
> > > > >> > > > > > > {110: 1, 111: 1, 109: 1, 108: 1, 112: 1, 103: 1, 107:
> 1,
> > > > 105:
> > > > >> 1,
> > > > >> > > 104:
> > > > >> > > > > 1,
> > > > >> > > > > > > 106: 1, 102: 1, 118: 1, 116: 1, 113: 1, 117: 1}
> > > > >> > > > > > >
> > > > >> > > > > > > Now all replicas are spread across 15 brokers.
> > > > >> > > > > > >
> > > > >> > > > > > > Is there something I am missing? Maybe the reason is
> to
> > > keep
> > > > >> > > network
> > > > >> > > > > > > chatter down?. By the way, I don't have any rack
> > awareness
> > > > >> > > > configured.
> > > > >> > > > > > > Thanks!
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >
> > > >
> > >
> >
>

Reply via email to