Hi David, DA1: It's a point I considered, especially being able to cordon a whole broker. With the current proposal you just need to set cordoned.log.dirs to the same value as log.dirs. That does not seem excessively difficult.
DA2: I did consider using a new metadata record like we do for fence/unfence. Since I don't expect cordoned log dirs to change very frequently, and the size should be small, I opted to reuse the BrokerRegistrationRecord metadata record. At the moment I guess it was mostly for the convenience while prototyping. Semantically it probably makes sense to have separate records. Your further point suggest design changes in the mechanism as a whole, so let's discuss these first and we can return to the exact metadata records after. I find the idea of having dedicated RPCs interesting. One of the reasons I piggybacked on the heartbeating process is for the mapping between log directory names and their UUIDs. Currently the mapping only exists on each broker instance. So if we wanted a dedicated RPC, we would first need to change how we manage the log directory to UUID mappings. I guess this could be done via the BrokerRegistration API. I'm not sure about storing additional metadata (reason, timestamp). We currently don't do that for any operations (AlterPartitionReassignments, UnregisterBroker). Typically these are stored in the tools used by the operators to drive these operations. You bring another interesting point about the ability to cordon brokers/log directories while they are offline. It's not something the current proposal supports. I'm not sure how useful this would turn out it practice. In experience is that brokers are mostly online, so I'd expect the need to do so relatively rare. This also kind of loops back to KAFKA-17094 [0] and the discussion Gantigmaa started on the dev list [1] about being able to identify the ids of offline (but still registered) brokers. DA3: With the current proposal, I don't see a reason why you would want to disable the new behavior. If you don't want to use it, you have nothing to do. It's opt-in as you need to set cordoned.log.dirs on some brokers to get the new behavior. If you don't want it anymore, you should unset cordoned.log.dirs. Can you explain why this would not work? DA4: Yes 0: https://issues.apache.org/jira/browse/KAFKA-17094 1: https://lists.apache.org/thread/1rrgbhk43d85wobcp0dqz6mhpn93j9yo Thanks, Mickael On Sun, Jul 14, 2024 at 10:37 AM Kamal Chandraprakash <kamal.chandraprak...@gmail.com> wrote: > > Hi Mickael, > > In the BrokerHearbeatRequest.json, the flexibleVersions are bumped from > "0+" to "1+". Is it a typo? > > > On Fri, Jul 12, 2024 at 11:42 PM David Arthur <mum...@gmail.com> wrote: > > > Mickael, thanks for the KIP! I think this could be quite a useful feature. > > > > DA1: Having to know each of the log dirs for a broker seems a bit > > inconvenient for cases where we want to cordon off a whole broker. I do > > think having the ability to cordon off a specific log dir is useful for > > JBOD, but I imagine a common case might be to cordon off the whole broker. > > > > DA2: Looks like the new "cordoned.log.dirs" can be configured statically > > and updated dynamically per-broker. What do you think about a new metadata > > record and RPC instead of using a config? From my understanding, the > > BrokerRegistration/Heartbeat is more about the lifecycle of a broker > > whereas cordoning a broker is an operator driven action. It might make > > sense to have a separate record for this. We could include additional > > fields like a timestamp, a reason/comment field (e.g., "decommissioning", > > "disk failure", "new broker" etc), stuff like that. > > > > This would also allow cordoning to be done while a broker is offline or > > before it has been provisioned. Not sure how likely that is, but might be > > useful? > > > > DA3: Can we consider having a configuration to enable/disable the new > > replica placer behavior? This would be separate from the new > > MetadataVersion for the RPC/record changes. > > > > DA4: In the Motivation section, you mention the cluster expansion scenario. > > For this scenario, is the expectation that the operator will cordon off the > > existing full brokers so placements only happen on the new brokers? > > > > Cheers, > > David > > > > On Fri, Jul 12, 2024 at 8:53 AM Mickael Maison <mickael.mai...@gmail.com> > > wrote: > > > > > Hi Kamal, > > > > > > Thanks for taking a look at the KIP! > > > > > > I briefly considered that option initially but I found it not very > > > practical once you have more than a few cordoned log directories. > > > I find your example is already not very easy to read, and it only has > > > 2 entries. Also if the configuration is at the cluster level it'sis > > > not easy to see if a broker has all its log directories cordoned, and > > > you still need to describe a specific broker's configuration to find > > > the "name" of a log directory you want to cordon. > > > > > > I think an easy way to get an overall view of the cordoned log > > > directories/brokers will be via the kafka-log-dirs.sh tool. I am also > > > considering adding metrics like we have today for LogDirectoryOffline > > > to ease monitoring. > > > > > > Thanks, > > > Mickael > > > > > > On Thu, Jul 11, 2024 at 8:41 PM Kamal Chandraprakash > > > <kamal.chandraprak...@gmail.com> wrote: > > > > > > > > Hi Mickael, > > > > > > > > Thanks for the KIP! > > > > > > > > This is a useful feature which helps to decommission the nodes by > > > > essentially > > > > creating a new replica exclude broker list. > > > > > > > > To cordon a list of brokers, we have to apply the config on each of the > > > > broker nodes > > > > and similarly to see the list of cordoned brokers, we have to either > > > query > > > > individual broker > > > > config in the cluster or query each of the broker log directory. > > > > > > > > Can we move the configuration from the broker to cluster level? (eg) > > > > > > > > bin/kafka-configs.sh --bootstrap-server localhost:9092 > > --broker-defaults > > > > --alter --add-config > > > > cordoned.log.dirs=[broker.id.0]:[log.dir.0],[broker.id.1]:[log.dir.1] > > > > > > > > This will provide a consistent view of the list of cordoned brokers in > > > the > > > > cluster. > > > > > > > > -- > > > > Kamal > > > > > > > > On Wed, Jul 10, 2024 at 7:53 PM Mickael Maison < > > mickael.mai...@gmail.com > > > > > > > > wrote: > > > > > > > > > Hi Luke, > > > > > > > > > > 4. You're right this scenario can happen. In this case I think the > > > > > broker should enforce its new state and not create the replica as all > > > > > its log directories are now cordoned. The replica will be offline and > > > > > an administrator would need to reassign it to another broker. I > > expect > > > > > most users will rely on kafka-configs.sh to manage the cordoned log > > > > > directories instead of updating the broker properties, so it should > > > > > not be a common issue. > > > > > > > > > > 6. To resolve this issue, the user can uncordon a log directory and > > > > > retry an operation that triggers the creation of the internal topics. > > > > > For example start a consumer using a group should make the broker > > > > > retry creating the __consumer_offsets topic. > > > > > > > > > > Thanks, > > > > > Mickael > > > > > > > > > > On Wed, Jul 10, 2024 at 4:14 PM Mickael Maison < > > > mickael.mai...@gmail.com> > > > > > wrote: > > > > > > > > > > > > Hi Chia-Ping, > > > > > > > > > > > > Question 1) Yes that's a good summary. I'd also add that managing > > > > > > cordoned log directories is intended to be done by cluster > > > > > > administrators who also know about operations in-progress or > > planned > > > > > > such as scaling or adding/removing log directories. In practice you > > > > > > can't expect users to provide "good" explicit partition assignments > > > as > > > > > > they are not aware of the cluster operations. > > > > > > > > > > > > Question 2) I don't see an actual question :) > > > > > > > > > > > > Question 3) My current proposal is that in that case, brokers will > > > > > > ignore the preferred log directory and place the new replica on a > > log > > > > > > directory that is not cordoned. To use AlterReplicaLogDirs you need > > > > > > ALTER permission on the CLUSTER resource. In most environments > > users > > > > > > don't have permissions to use that API. Or if they do, I'd expect > > > them > > > > > > to also have ALTER_CONFIGS on the BROKER resource and be able to > > > > > > change the cordoned log directories configuration. > > > > > > > > > > > > Question 4) If we don't include the CordonedLogDirs field in > > > > > > BrokerRegistrationRequest as you said there's a small window when > > the > > > > > > controller would not know if a broker can be used for assignments. > > If > > > > > > all log directories are cordoned the controller should not use that > > > > > > broker for assignments. > > > > > > > > > > > > Question 5) In KRaft mode offline brokers are still registered, and > > > > > > can be used for partition assignment. So we need to persist the > > > > > > cordoned log directories too to exclude brokers that don't have > > > > > > uncordoned log directories. > > > > > > > > > > > > Thanks, > > > > > > Mickael > > > > > > > > > > > > > > > > > > On Wed, Jul 10, 2024 at 12:25 PM Luke Chen <show...@gmail.com> > > > wrote: > > > > > > > > > > > > > > Hi Mickael, > > > > > > > > > > > > > > Thanks for the response. > > > > > > > > > > > > > > > 4. Cordoned log directories are persisted to the metadata log > > > via the > > > > > > > RegisterBrokerRecord, BrokerRegistrationChangeRecord records. If > > a > > > > > > > broker is offline, the controller will use the latest known state > > > of > > > > > > > the broker to determine the broker's cordoned log directories. > > I've > > > > > > > added a sentence clarifying this point. > > > > > > > > > > > > > > OK, so if the broker A goes offline, and the controller is in > > > "fenced" > > > > > > > state, without any cordoned log dirs, then some topic created and > > > > > assigned > > > > > > > to broker A. Later, broker A starts up with all its log dirs > > > cordoned > > > > > > > configured. At this situation, will the broker A create the > > > partitions? > > > > > > > > > > > > > > > 6. I'm leaning towards considering that scenario a > > configuration > > > > > > > error. If all log directories are cordoned before the internal > > > topics > > > > > > > are created, then the broker will not be able to create them. > > This > > > > > > > seems like a pretty strange scenario, where it's the first time > > you > > > > > > > start a broker, you've cordoned all its log directory, and the > > > > > > > internal topics (offsets and transactions) have not yet been > > > created > > > > > > > in the rest of the cluster. > > > > > > > > > > > > > > Yes, I agree that this should be a configuration error. > > > > > > > So the follow-up question is: Suppose users encounter this issue, > > > and > > > > > how > > > > > > > could they resolve it? > > > > > > > Uncordon the log dir dynamically using kafka-configs.sh? Will the > > > > > > > uncordoning config change recreate the partitions we didn't > > create > > > > > earlier > > > > > > > because of log dir cordoned? > > > > > > > > > > > > > > > The metadata log is different (not managed by LogManager), so I > > > think > > > > > > > it should always be created regardless if its log directory is > > > > > > > cordoned or not. > > > > > > > > > > > > > > I agree we should treat "__cluster_metadata" differently. > > > > > > > > > > > > > > > > > > > > > Thanks. > > > > > > > Luke > > > > > > > > > > > > > > > > > > > > > On Wed, Jul 10, 2024 at 12:42 AM Mickael Maison < > > > > > mickael.mai...@gmail.com> > > > > > > > wrote: > > > > > > > > > > > > > > > Hi Luke, > > > > > > > > > > > > > > > > 2. isCordoned() is a new method on LogDirDescription. It does > > not > > > > > take > > > > > > > > any arguments. It just returns true if this log directory the > > > > > > > > LogDirDescription represents is cordoned. > > > > > > > > > > > > > > > > 3. Sorry that was a typo. This method will only return a log > > > > > directory > > > > > > > > that is not cordoned. Fixed > > > > > > > > > > > > > > > > 4. Cordoned log directories are persisted to the metadata log > > > via the > > > > > > > > RegisterBrokerRecord, BrokerRegistrationChangeRecord records. > > If > > > a > > > > > > > > broker is offline, the controller will use the latest known > > > state of > > > > > > > > the broker to determine the broker's cordoned log directories. > > > I've > > > > > > > > added a sentence clarifying this point. > > > > > > > > > > > > > > > > 5. Yes a log directory can be uncordoned. You can either update > > > the > > > > > > > > properties file and restart the broker or dynamically change > > the > > > > > value > > > > > > > > at runtime using kafka-configs. I've added a paragraph about it > > > in > > > > > the > > > > > > > > KIP. > > > > > > > > > > > > > > > > 6. I'm leaning towards considering that scenario a > > configuration > > > > > > > > error. If all log directories are cordoned before the internal > > > topics > > > > > > > > are created, then the broker will not be able to create them. > > > This > > > > > > > > seems like a pretty strange scenario, where it's the first time > > > you > > > > > > > > start a broker, you've cordoned all its log directory, and the > > > > > > > > internal topics (offsets and transactions) have not yet been > > > created > > > > > > > > in the rest of the cluster. > > > > > > > > The metadata log is different (not managed by LogManager), so I > > > think > > > > > > > > it should always be created regardless if its log directory is > > > > > > > > cordoned or not. > > > > > > > > > > > > > > > > Thanks, > > > > > > > > Mickael > > > > > > > > > > > > > > > > On Tue, Jul 9, 2024 at 3:48 PM Chia-Ping Tsai < > > > chia7...@gmail.com> > > > > > wrote: > > > > > > > > > > > > > > > > > > hi Mickael > > > > > > > > > > > > > > > > > > That is totally a good idea, but I have a question about the > > > > > > > > implementation > > > > > > > > > > > > > > > > > > Do we consider making pluggable ReplicaPlacer (KIP-660) first > > > and > > > > > then > > > > > > > > add > > > > > > > > > another impl of ReplicaPlacer to offer cordon mechanism? > > Noted > > > that > > > > > > > > > `ReplicaPlacer` can implement Reconfigurable to get updated > > at > > > > > runtime. > > > > > > > > > That is similar to KIP-1066 - change cordoned.log.dirs > > through > > > > > configs > > > > > > > > > update. > > > > > > > > > > > > > > > > > > The benefit is to let users have their optimized policy for > > > > > specific > > > > > > > > > scenario. Also, it can avoid that we add more and more > > > mechanism > > > > > to our > > > > > > > > > code base. Of course we can merge the mechanism which can be > > > used > > > > > by 99% > > > > > > > > > users :smile > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > Chia-Ping > > > > > > > > > > > > > > > > > > > > > > > > > > > Luke Chen <show...@gmail.com> 於 2024年7月9日 週二 下午9:07寫道: > > > > > > > > > > > > > > > > > > > Hi Mickael, > > > > > > > > > > > > > > > > > > > > Thanks for the KIP! > > > > > > > > > > This is a long waiting feature for many users! > > > > > > > > > > > > > > > > > > > > Questions: > > > > > > > > > > 1. I think piggyback the "BrokerHeartbeatRequest" to > > forward > > > the > > > > > > > > corden log > > > > > > > > > > dir to controller makes sense to me. > > > > > > > > > > We already did similar things for fence, controller > > shutdown, > > > > > failed > > > > > > > > log > > > > > > > > > > dir...etc. > > > > > > > > > > > > > > > > > > > > 2. In the admin API, what parameters will the new added > > > > > isCordoned() > > > > > > > > method > > > > > > > > > > take? > > > > > > > > > > > > > > > > > > > > 3. In the KIP, we said: > > > > > > > > > > "defaultDir(): This method will not return the Uuid of a > > log > > > > > directory > > > > > > > > that > > > > > > > > > > is not cordoned." > > > > > > > > > > --> It's hard to understand. Does that mean we will only > > > return > > > > > > > > cordoned > > > > > > > > > > log dir? > > > > > > > > > > From the current java doc of the interface, it doesn't look > > > > > right: > > > > > > > > > > "Get the default directory for new partitions placed in a > > > given > > > > > > > > broker." > > > > > > > > > > > > > > > > > > > > 4. Currently, if a broker is registered and then go > > offline. > > > In > > > > > this > > > > > > > > state, > > > > > > > > > > the controller will still distribute partitions to this > > > broker. > > > > > > > > > > So, if now, the broker get startup with "cordoned.log.dirs" > > > set, > > > > > what > > > > > > > > will > > > > > > > > > > happen? > > > > > > > > > > Will the newly assigned partitions be created successfully > > or > > > > > not? > > > > > > > > > > > > > > > > > > > > 5. I think after a log dir get cordoned, we can always > > > uncordon > > > > > it, > > > > > > > > right? > > > > > > > > > > I think we should mention it in the KIP. > > > > > > > > > > > > > > > > > > > > 6. If a broker is startup with "cordoned.log.dirs" set, and > > > does > > > > > that > > > > > > > > mean > > > > > > > > > > the internal topics partitions (ex: __consumer_offsets) > > > cannot be > > > > > > > > created, > > > > > > > > > > either? > > > > > > > > > > Also, if this log dir is happen to be the metadata log dir, > > > what > > > > > will > > > > > > > > > > happen to the metadata topic creation? > > > > > > > > > > > > > > > > > > > > Thanks. > > > > > > > > > > Luke > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Jul 9, 2024 at 12:12 AM Mickael Maison < > > > > > > > > mickael.mai...@gmail.com> > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > > > Thanks for taking a look. > > > > > > > > > > > > > > > > > > > > > > - Yes you're right, I meant AlterPartitionReassignments. > > > Fixed. > > > > > > > > > > > - That's a good idea. I was expecting users to discover > > > > > cordoned log > > > > > > > > > > > directories by describing broker configurations. But > > being > > > > > able to > > > > > > > > > > > also get this information when describing log directories > > > makes > > > > > > > > sense. > > > > > > > > > > > I've added that to the KIP. > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > Mickael > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Jul 5, 2024 at 8:05 AM Haruki Okada < > > > > > ocadar...@gmail.com> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > > > > > Thank you for the KIP. > > > > > > > > > > > > The motivation sounds make sense to me. > > > > > > > > > > > > > > > > > > > > > > > > I have a few questions: > > > > > > > > > > > > > > > > > > > > > > > > - [nits] "AlterPartitions request" in Error handling > > > section > > > > > is > > > > > > > > > > > > "AlterPartitionReassignments request" actually, right? > > > > > > > > > > > > - Don't we need to include cordoned information in > > > > > DescribeLogDirs > > > > > > > > > > > response > > > > > > > > > > > > too? Some tools (e.g. CruiseControl) need to have a way > > > to > > > > > know > > > > > > > > which > > > > > > > > > > > > broker/log-dirs are cordoned to generate partition > > > > > reassignment > > > > > > > > > > proposal. > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > > > > > 2024年7月4日(木) 22:57 Mickael Maison < > > > mickael.mai...@gmail.com > > > > > >: > > > > > > > > > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > > > > > > > I'd like to start a discussion on KIP-1066 that > > > introduces > > > > > a > > > > > > > > > > mechanism > > > > > > > > > > > > > to cordon log directories and brokers. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1066%3A+Mechanism+to+cordon+brokers+and+log+directories > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > Mickael > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > ======================== > > > > > > > > > > > > Okada Haruki > > > > > > > > > > > > ocadar...@gmail.com > > > > > > > > > > > > ======================== > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > David Arthur > >