Re: [DISCUSS] KIP-836: Addition of Information in DescribeQuorumResponse about Voter Lag

Ron Dagostino Thu, 12 May 2022 10:57:49 -0700

Hi Niket.  Thanks for the KIP.  Are all the fields you specified
always known?  For example, might a new controller not have a last
fetch time for other voters, and then what would it send in the
response?  If this is possible then we should be explicit about what
is to be sent in this case.


Ron

On Thu, May 12, 2022 at 12:54 PM Niket Goel <ng...@confluent.io.invalid> wrote:
>
> Thanks for the suggestion Colin.
>
> > One minor point: I suspect that whatever we end up naming the additional
> fields here, should also be the name of the metrics in KIP-835. So if we go
> with a metric named "last-applied-offset" we'd want a lastAppliedOffset
> field here, and so on.
>
> This is a good point. Will respond to the discussion thread on KIP-835
> about the dependency here.
>
> > I also wonder if it makes sense for us to report the timestamp of the
> latest batch that has been fetched (and not necessarily applied) rather
> than the wall clock time at which the leader made the latest fetch.
>
> In theory I am onboard with your suggestion and honestly I too wanted to
> add something similar. However, from what I understand (and please correct
> me if my understanding is off), the `DescribeQuorum` API as it is
> implemented lives in the Raft layer and utilizes the data available within
> that layer to fill out the response. To achieve a more accurate info on
> what was applied etc like you recommend, we would need to look into the
> log.
> This leaves us two with options high level options --
> 1. Peek into the log in the raft layer:
>   I think this is definitely not the way to go as it breaks the isolation
> the raft layer has from the contents of the log and also introduces more
> computational work which would hurt performance.
> 2. Have the layer above the Raft Client (so the controller) provide the
> required information:
>   We can consider this approach, however it will break the separation
> between the layers. IIUC, the `DescribeQuorum` API is intended to be a Raft
> API, but doing this will result in it being dependent on the controller (or
> some layer driving the raft client). I am not sure if that is the direction
> we want to go in the long term.
>
> I think my meta point is that there might be a way to get more accurate
> information of "lag" into the response, but the question is that if that
> additional fidelity in the accuracy of the lag is worth the cost we will
> end up paying to add it.
>
> Let me know your thoughts on this.
>
> On Wed, May 11, 2022 at 12:56 PM Colin McCabe <cmcc...@apache.org> wrote:
>
> > Thanks, Niket. I also agree with Jason that this is a public API despite
> > the lack of command-line tool, so we do indeed need a KIP. :)
> >
> > One minor point: I suspect that whatever we end up naming the additional
> > fields here, should also be the name of the metrics in KIP-835. So if we go
> > with a metric named "last-applied-offset" we'd want a lastAppliedOffset
> > field here, and so on.
> >
> > I also wonder if it makes sense for us to report the timestamp of the
> > latest batch that has been fetched (and not necessarily applied) rather
> > than the wall clock time at which the leader made the latest fetch. If we
> > take both timestamps directly from the metadata log, we know they'll be
> > comparable even in the presence of clock skew. And we know because of
> > KIP-835 that the metadata log won't go quiet for prolonged periods.
> >
> > best,
> > Colin
> >
> >
> > On Tue, May 10, 2022, at 13:30, Niket Goel wrote:
> > >> @Niket does it make sense to add the Admin API to this KIP?
> > >
> > > Thanks Deng for pointing this out. I agree with Jason's suggestion. I
> > will
> > > go ahead and add the admin API to this KIP.
> > >
> > > - Niket
> > >
> > > On Tue, May 10, 2022 at 11:44 AM Jason Gustafson
> > <ja...@confluent.io.invalid>
> > > wrote:
> > >
> > >> > Hello Niket, currently DescribeQuorumResponse is not a public API, we
> > >> don’t have a Admin api or shell script to get DescribeQuorumResponse, so
> > >> it’s unnecessary to submit a KIP to change it, you can just submit a PR
> > to
> > >> accomplish this.
> > >>
> > >> Hey Ziming, I think it is public. It was documented in KIP-595 and we
> > have
> > >> implemented the API on the server. However, it looks like I never added
> > >> the Admin API (even though it is assumed by the
> > `kafka-metadata-quorum.sh`
> > >> tool). @Niket does it make sense to add the Admin API to this KIP?
> > >>
> > >> Best,
> > >> Jason
> > >>
> > >> On Mon, May 9, 2022 at 8:09 PM deng ziming <dengziming1...@gmail.com>
> > >> wrote:
> > >>
> > >> > Hello Niket, currently DescribeQuorumResponse is not a public API, we
> > >> > don’t have a Admin api or shell script to get DescribeQuorumResponse,
> > so
> > >> > it’s unnecessary to submit a KIP to change it, you can just submit a
> > PR
> > >> to
> > >> > accomplish this.
> > >> >
> > >> > --
> > >> > Thanks
> > >> > Ziming
> > >> >
> > >> > > On May 10, 2022, at 1:33 AM, Niket Goel <ng...@confluent.io.INVALID
> > >
> > >> > wrote:
> > >> > >
> > >> > > Hi all,
> > >> > >
> > >> > > I created a KIP to add some more information to
> > >> > `DesscribeQuorumResponse` to enable ascertaining voter lag in the
> > quorum
> > >> a
> > >> > little better.
> > >> > > Please see KIP --
> > >> >
> > >>
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-836%3A+Additional+Information+in+DescribeQuorumResponse+about+Voter+Lag
> > >> > >
> > >> > > Thanks for your feedback,
> > >> > > Niket Goel
> > >> >
> > >> >
> > >>
> > >
> > >
> > > --
> > > - Niket
> >
>
>
> --
> - Niket

Re: [DISCUSS] KIP-836: Addition of Information in DescribeQuorumResponse about Voter Lag

Reply via email to