Thank you Luke for taking a look again.

If there are no further comments, I would like to start the vote for this
KIP tomorrow.

Regards,
Gantigmaa

On Thu, Sep 5, 2024 at 8:30 AM Luke Chen <show...@gmail.com> wrote:

> Hi Tina,
>
> Thanks for updating the KIP.
> This change makes sense to me.
>
> Thanks.
> Luke
>
> On Wed, Aug 28, 2024 at 6:34 PM Gantigmaa Selenge <gsele...@redhat.com>
> wrote:
>
> > Thank you both for reviewing the KIP again.
> >
> > Addressing Colin's comments:
> >
> >  > The admin client changes don't seem to include a way to find out if a
> > given broker is fenced, by examining the result. Seems like we need that
> as
> > well?
> >
> > I think admins usually send describeCluster requests to find out about
> > cluster's node ids and their descriptions. So I'm not sure if we really
> > need the ability to describe a specific node as they would still need to
> > send a request to find out the node id first, which could already include
> > the fenced state. However, if you think it's useful to only describe a
> > particular node, we can add that to the admin client changes. I guess we
> > would need to add another field to the DescribeClusterOptions to specify
> a
> > node id so that the response returns the description for that node only
> > (ignoring rest of the nodes in the result returned from broker)?
> >
> > > You need to describe what happens when the client requests to see
> fenced
> > brokers, but the protocol doesn't support that. I would suggest
> > UnsupportedVersionException.
> >
> > With Jose's suggestion to remove the includeFencedBroker option, we might
> > not need this exception anymore. Admin clients connecting to older
> brokers
> > would simply not receive the fenced state of the nodes.
> >
> > Addressing Jose's comments:
> >
> > > Did you consider making "includeFencedBroker" field value implicitly
> > based on the DescribeClusterRequest version and the EndpointType? For
> > example, if the request version is 2 and the endpoint type is broker,
> > the response will include all of the brokers and if they are fenced or
> > unfenced. I am suggesting this because a request of endpointType
> > controller and includeFencedBroker true doesn't make sense. In
> > general, users of the AdminClient cannot easily discover the API
> > versions supported by the destination node since the destination node
> > may not be known when the admin call is made.
> >
> > This is a good point! I agree that endpointType controller and
> > includeFencedBroker true doesn't make sense. I thought the
> > includeFencedBroker option would be ignored since there is no fenced
> state
> > in the ControllerRegistration. So I think what you are suggesting is that
> > instead of having the includeFencedBroker option in the request, we would
> > return all broker nodes and their description including the fenced state,
> > if the request version is 2. I guess if endpointType is controller,
> > controller nodes will be returned with a fenced state set to false (it
> is a
> > boolean field in Node class). I'm happy to go with this approach because
> I
> > think it is even simpler.
> >
> >
> > > It is unfortunate that this RPC is handled using the metadata cache.
> > That means that the user cannot find out when was the last time the
> > broker contacted the controller cluster. Knowing that a broker is
> > fenced is not enough information to determine to unregister a broker.
> > If the user knows that the broker has been fenced for hours or day, it
> > may be enough information to decide to unregister the cluster.
> >
> > I understand that we can't determine how long a broker node has been in a
> > fenced state with the API. The motivation of this proposal is to help
> > admins to find out which broker is fenced, when they are performing scale
> > down operation. Nodes wouldn't usually get unregistered unless the
> > cluster is being scaled down, however admin could also find out the id
> of a
> > fenced broker using the DescribeCluster API and use it to check the
> metrics
> > such as lastFetchTimeStamp to determine how longs it's been fenced.
> >
> > > For this command to work, the user must pass --bootstrap-server. If
> > the user passes --bootstrap-controller, the controller nodes will get
> > listed instead. Are you planning to also implement "kafka-cluster
> > --bootstrap-controller <endpoints> list-controllers"? The other option
> > is to make the command "list-nodes" and it will list the controllers
> > or brokers based on the bootstrap server used. Or are you planning to
> > fix the controller side and broker side implementation so that
> > controller nodes can return registered brokers and broker nodes can
> > return registered controllers?
> >
> > I was planning to limit the "list-brokers" command only to be used with
> > --bootstrap-server. If the command was used with --bootstrap-controller,
> > the user would get an error back. I should have made it clear in the KIP.
> > However, thinking about this now, if we go ahead with the suggestion from
> > your first point, I think we can change the command to "list-nodes" and
> > output either controller nodes or broker nodes depending on the bootstrap
> > argument. The output will have a fenced state printed only if
> > --bootstrap-server is used. What do you think?
> >
> > > Is the tool going to print the string "→ (Broker 2 has shutdown but
> > still registered)" to stdout?
> >
> > Sorry, that was meant to be just a comment clarifying the new row, which
> > may not be necessary. I will remove it.
> >
> > Regards,
> > Gantigmaa
> >
> > On Mon, Aug 26, 2024 at 3:19 PM José Armando García Sancio
> > <jsan...@confluent.io.invalid> wrote:
> >
> > > Thanks for the KIP Gantigmaa,
> > >
> > > > { "name": "IncludeFencedBrokers", "type": "bool", "versions": "2+",
> > > "about": "Whether to include fenced brokers." }
> > >
> > > Did you consider making "includeFencedBroker" field value implicitly
> > > based on the DescribeClusterRequest version and the EndpointType? For
> > > example, if the request version is 2 and the endpoint type is broker,
> > > the response will include all of the brokers and if they are fenced or
> > > unfenced. I am suggesting this because a request of endpointType
> > > controller and includeFencedBroker true doesn't make sense. In
> > > general, users of the AdminClient cannot easily discover the API
> > > versions supported by the destination node since the destination node
> > > may not be known when the admin call is made.
> > >
> > > > { "name": "Fenced", "type": "bool", "versions": "2+", "about":
> "Whether
> > > the broker is fenced." }
> > >
> > > It is unfortunate that this RPC is handled using the metadata cache.
> > > That means that the user cannot find out when was the last time the
> > > broker contacted the controller cluster. Knowing that a broker is
> > > fenced is not enough information to determine to unregister a broker.
> > > If the user knows that the broker has been fenced for hours or day, it
> > > may be enough information to decide to unregister the cluster.
> > >
> > > > kafka-cluster.sh --bootstrap-server localhost:9092 list-brokers
> > >
> > > For this command to work, the user must pass --bootstrap-server. If
> > > the user passes --bootstrap-controller, the controller nodes will get
> > > listed instead. Are you planning to also implement "kafka-cluster
> > > --bootstrap-controller <endpoints> list-controllers"? The other option
> > > is to make the command "list-nodes" and it will list the controllers
> > > or brokers based on the bootstrap server used. Or are you planning to
> > > fix the controller side and broker side implementation so that
> > > controller nodes can return registered brokers and broker nodes can
> > > return registered controllers?
> > >
> > > I guess the other option is to limit the kafka-cluster tool for
> > > brokers and they must use the kafka-metadata-quorum tool for
> > > controllers. What do you think?
> > >
> > > > 2          broker-2  9092       fenced   → (Broker 2 has shutdown but
> > > still registered)
> > >
> > > Is the tool going to print the string "→ (Broker 2 has shutdown but
> > > still registered)" to stdout?
> > >
> > > Thanks,
> > > -José
> > >
> > >
> >
>

Reply via email to