Thank you Luke for taking a look again. If there are no further comments, I would like to start the vote for this KIP tomorrow.
Regards, Gantigmaa On Thu, Sep 5, 2024 at 8:30 AM Luke Chen <show...@gmail.com> wrote: > Hi Tina, > > Thanks for updating the KIP. > This change makes sense to me. > > Thanks. > Luke > > On Wed, Aug 28, 2024 at 6:34 PM Gantigmaa Selenge <gsele...@redhat.com> > wrote: > > > Thank you both for reviewing the KIP again. > > > > Addressing Colin's comments: > > > > > The admin client changes don't seem to include a way to find out if a > > given broker is fenced, by examining the result. Seems like we need that > as > > well? > > > > I think admins usually send describeCluster requests to find out about > > cluster's node ids and their descriptions. So I'm not sure if we really > > need the ability to describe a specific node as they would still need to > > send a request to find out the node id first, which could already include > > the fenced state. However, if you think it's useful to only describe a > > particular node, we can add that to the admin client changes. I guess we > > would need to add another field to the DescribeClusterOptions to specify > a > > node id so that the response returns the description for that node only > > (ignoring rest of the nodes in the result returned from broker)? > > > > > You need to describe what happens when the client requests to see > fenced > > brokers, but the protocol doesn't support that. I would suggest > > UnsupportedVersionException. > > > > With Jose's suggestion to remove the includeFencedBroker option, we might > > not need this exception anymore. Admin clients connecting to older > brokers > > would simply not receive the fenced state of the nodes. > > > > Addressing Jose's comments: > > > > > Did you consider making "includeFencedBroker" field value implicitly > > based on the DescribeClusterRequest version and the EndpointType? For > > example, if the request version is 2 and the endpoint type is broker, > > the response will include all of the brokers and if they are fenced or > > unfenced. I am suggesting this because a request of endpointType > > controller and includeFencedBroker true doesn't make sense. In > > general, users of the AdminClient cannot easily discover the API > > versions supported by the destination node since the destination node > > may not be known when the admin call is made. > > > > This is a good point! I agree that endpointType controller and > > includeFencedBroker true doesn't make sense. I thought the > > includeFencedBroker option would be ignored since there is no fenced > state > > in the ControllerRegistration. So I think what you are suggesting is that > > instead of having the includeFencedBroker option in the request, we would > > return all broker nodes and their description including the fenced state, > > if the request version is 2. I guess if endpointType is controller, > > controller nodes will be returned with a fenced state set to false (it > is a > > boolean field in Node class). I'm happy to go with this approach because > I > > think it is even simpler. > > > > > > > It is unfortunate that this RPC is handled using the metadata cache. > > That means that the user cannot find out when was the last time the > > broker contacted the controller cluster. Knowing that a broker is > > fenced is not enough information to determine to unregister a broker. > > If the user knows that the broker has been fenced for hours or day, it > > may be enough information to decide to unregister the cluster. > > > > I understand that we can't determine how long a broker node has been in a > > fenced state with the API. The motivation of this proposal is to help > > admins to find out which broker is fenced, when they are performing scale > > down operation. Nodes wouldn't usually get unregistered unless the > > cluster is being scaled down, however admin could also find out the id > of a > > fenced broker using the DescribeCluster API and use it to check the > metrics > > such as lastFetchTimeStamp to determine how longs it's been fenced. > > > > > For this command to work, the user must pass --bootstrap-server. If > > the user passes --bootstrap-controller, the controller nodes will get > > listed instead. Are you planning to also implement "kafka-cluster > > --bootstrap-controller <endpoints> list-controllers"? The other option > > is to make the command "list-nodes" and it will list the controllers > > or brokers based on the bootstrap server used. Or are you planning to > > fix the controller side and broker side implementation so that > > controller nodes can return registered brokers and broker nodes can > > return registered controllers? > > > > I was planning to limit the "list-brokers" command only to be used with > > --bootstrap-server. If the command was used with --bootstrap-controller, > > the user would get an error back. I should have made it clear in the KIP. > > However, thinking about this now, if we go ahead with the suggestion from > > your first point, I think we can change the command to "list-nodes" and > > output either controller nodes or broker nodes depending on the bootstrap > > argument. The output will have a fenced state printed only if > > --bootstrap-server is used. What do you think? > > > > > Is the tool going to print the string "→ (Broker 2 has shutdown but > > still registered)" to stdout? > > > > Sorry, that was meant to be just a comment clarifying the new row, which > > may not be necessary. I will remove it. > > > > Regards, > > Gantigmaa > > > > On Mon, Aug 26, 2024 at 3:19 PM José Armando García Sancio > > <jsan...@confluent.io.invalid> wrote: > > > > > Thanks for the KIP Gantigmaa, > > > > > > > { "name": "IncludeFencedBrokers", "type": "bool", "versions": "2+", > > > "about": "Whether to include fenced brokers." } > > > > > > Did you consider making "includeFencedBroker" field value implicitly > > > based on the DescribeClusterRequest version and the EndpointType? For > > > example, if the request version is 2 and the endpoint type is broker, > > > the response will include all of the brokers and if they are fenced or > > > unfenced. I am suggesting this because a request of endpointType > > > controller and includeFencedBroker true doesn't make sense. In > > > general, users of the AdminClient cannot easily discover the API > > > versions supported by the destination node since the destination node > > > may not be known when the admin call is made. > > > > > > > { "name": "Fenced", "type": "bool", "versions": "2+", "about": > "Whether > > > the broker is fenced." } > > > > > > It is unfortunate that this RPC is handled using the metadata cache. > > > That means that the user cannot find out when was the last time the > > > broker contacted the controller cluster. Knowing that a broker is > > > fenced is not enough information to determine to unregister a broker. > > > If the user knows that the broker has been fenced for hours or day, it > > > may be enough information to decide to unregister the cluster. > > > > > > > kafka-cluster.sh --bootstrap-server localhost:9092 list-brokers > > > > > > For this command to work, the user must pass --bootstrap-server. If > > > the user passes --bootstrap-controller, the controller nodes will get > > > listed instead. Are you planning to also implement "kafka-cluster > > > --bootstrap-controller <endpoints> list-controllers"? The other option > > > is to make the command "list-nodes" and it will list the controllers > > > or brokers based on the bootstrap server used. Or are you planning to > > > fix the controller side and broker side implementation so that > > > controller nodes can return registered brokers and broker nodes can > > > return registered controllers? > > > > > > I guess the other option is to limit the kafka-cluster tool for > > > brokers and they must use the kafka-metadata-quorum tool for > > > controllers. What do you think? > > > > > > > 2 broker-2 9092 fenced → (Broker 2 has shutdown but > > > still registered) > > > > > > Is the tool going to print the string "→ (Broker 2 has shutdown but > > > still registered)" to stdout? > > > > > > Thanks, > > > -José > > > > > > > > >