Great, I’ll ask again for a vote then on this KIP as it seems like all the concerns have been addressed.
> On 30 Jul 2024, at 16:00, David Arthur <mum...@gmail.com> wrote: > > Omnia, thanks for the updates! > >> Am happy to add section for throttling in this KIP if it is high concern > or open a followup KIP for this once we already have the pagination in > place. Which one do you suggest? > > I'm okay leaving throttling for a future KIP. It might be useful to see the > feature in action for a while before deciding if its necessary or the best > way to approach it. > > On Mon, Jul 22, 2024 at 9:23 AM Omnia Ibrahim <o.g.h.ibra...@gmail.com> > wrote: > >> >> Hi David, thanks for the feedback and sorry for taking long to respond as >> I was off for a week. >>> DA1: In "Public Interfaces" you say "max.request.pagination.size.limit" >>> controls the max items to return by default. It's not clear to me if this >>> is just a default, or if it is a hard limit. In KIP-966, this config >> serves >>> as a hard limit to prevent misconfigured or malicious clients from >>> requesting too many resources. Can you clarify this bit? >> >> `max.request.partition.size.limit` will be used in same way as KIP-966 I >> just meant `max.request.partition.size.limit` will equal >> `max.request.pagination.size.limit` by default unless it is specified >> otherwise. I clarified this in the KIP now >> >>> DA2: Is "ItemsLeftToFetch" accounting for authorization? If not, it could >>> be considered a minor info leak. >> >> This is a good point. Any of the requests still will count to what ACLs >> and resources the authorised user is used by the client, the pagination >> will not effect this. >> In cases where the client is using user with wild ACLs I am assuming this >> is okay and they have the right to see this info. >> However am rethinking this now as it might not be that useful and we can >> just relay on if the there is a next cursor or not to simplify the approach >> similar to KIP-966. I have updated the KIP to reflect this. >> >>> DA3: By splitting up results into pages, we are introducing the >> possibility >>> of inconsistency into these RPCs. For example, today MetadataRequest >>> returns results from the same MetadataImage, so the response is >> consistent. >>> With the paging approach, it is possible (likely even) that different >>> requests will be served from different MetadataImage-s, leading to >>> inconsistencies. This can be even worse if paged requests go to different >>> brokers that may be lagging behind in metadata propagation. BTW this >> issue >>> exists for KIP-966 as well. We don't necessarily need to solve this right >>> away, but I think it's worth mentioning in the KIP. >> >> I added a limitation section to the KIP to mention this. I also mentioned >> it in the top section of public interfaces. >> >>> DA4: Have we considered some generic throttling for paged requests? I >>> expect it might be an issue if clients want to get everything and just >> page >>> through all of the results as quickly as possible. >> I didn’t consider throttling for pagination requests as >> Right now the only throttling AdminClient knows is throttling >> TopicCreate/Delete which is different than pagination and might need it is >> own conversation and KIP. >> For example in the case of throttling and retries > timeouts, should >> consider send back what we fetched so far and allow the operator to set the >> cursor next time. If this is the case then we need to include cursor to all >> the Option classes to these requests. Also Admin API for >> DescribeTopicPartitionRequest in KIP-966 don’t provide Cursor as part of >> DescribeTopicsOptions. >> Also extending `controllerMutation` or should we separate the paging >> throttling to its own quota >> The only requests I think might actively scraped are `OffsetFetchRequest`, >> `ListGroupsRequest`, `DescribeGroupsRequest` and >> `ConsumerGroupDescribeRequest` to actively provide lag metrics/dashboards >> to consumers. So there might be too many pages. >> The rest of the requests mostly used during maintenance of the cluster or >> incidents (specially the producer/txn requests) and operator of the cluster >> need them to take a decision. The pagination just provides them with a way >> to escape the timeout problem with large clusters. So am not sure adding >> throttling during such time would be wise. >> Am happy to add section for throttling in this KIP if it is high concern >> or open a followup KIP for this once we already have the pagination in >> place. Which one do you suggest? >> >> Thanks >> Omnia >> >>> On 12 Jul 2024, at 14:56, David Arthur <mum...@gmail.com> wrote: >>> >>> Hey Omnia, thanks for the KIP! I think this will be a really nice >>> improvement for operators. >>> >>> DA1: In "Public Interfaces" you say "max.request.pagination.size.limit" >>> controls the max items to return by default. It's not clear to me if this >>> is just a default, or if it is a hard limit. In KIP-966, this config >> serves >>> as a hard limit to prevent misconfigured or malicious clients from >>> requesting too many resources. Can you clarify this bit? >>> >>> DA2: Is "ItemsLeftToFetch" accounting for authorization? If not, it could >>> be considered a minor info leak. >>> >>> DA3: By splitting up results into pages, we are introducing the >> possibility >>> of inconsistency into these RPCs. For example, today MetadataRequest >>> returns results from the same MetadataImage, so the response is >> consistent. >>> With the paging approach, it is possible (likely even) that different >>> requests will be served from different MetadataImage-s, leading to >>> inconsistencies. This can be even worse if paged requests go to different >>> brokers that may be lagging behind in metadata propagation. BTW this >> issue >>> exists for KIP-966 as well. We don't necessarily need to solve this right >>> away, but I think it's worth mentioning in the KIP. >>> >>> DA4: Have we considered some generic throttling for paged requests? I >>> expect it might be an issue if clients want to get everything and just >> page >>> through all of the results as quickly as possible. >>> >>> Cheers, >>> David >>> >>> >>> On Tue, Jul 9, 2024 at 8:20 AM Omnia Ibrahim <o.g.h.ibra...@gmail.com> >>> wrote: >>> >>>> Hey, >>>> I will open a voting thread by end of this week if there is no more >>>> feedbacks. >>>> >>>> Thanks >>>> Omnia >>>> >>>>> On 2 Jul 2024, at 17:25, Omnia Ibrahim <o.g.h.ibra...@gmail.com> >> wrote: >>>>> >>>>> Hi David, Thanks for having a look into this! >>>>> >>>>>> DJ1: Why do we want to migrate only a subset of the APIs vs migrating >>>> all >>>>>> of them? For instance, there are DescribeGroups, >> ConsumerGroupDescribe, >>>>>> etc. Do we have reasons not to migrate them too? I think that it would >>>> be >>>>>> great to have a KIP that establishes the pattern for all the admin >> APIs. >>>>> >>>>> No particular reason I only was trying to focus on the requests that >>>> cause the most pain ( as far as I know with Kafka). I don’t mind expand >> it >>>> to other requests in AdminClient as well. >>>>> The other thing I was wondering about if >>>> max.request.partition.size.limit still make sense or should it be >> renamed >>>> (or have another one called max.request.pagingation.size.limit) if we >> going >>>> to expand to cover all Admin requests. I am leaning toward >>>> max.request.pagingation.size.limit and max.request.partition.size.limit >> can >>>> be set for now to max.request.partition.size.limit by default. >>>>> >>>>> I have updated the KIP now to cover more requests please have a second >>>> look and let me know if I missed any? >>>>> >>>>>> DJ2: I am not a fan of all the new parameters passed to the tools >>>>>> (e.g. --partition-size-limit-per-response). I wonder if we could >> rather >>>>>> have a config in the admin client to set the default page size for all >>>>>> requests. >>>>> >>>>> If it going to be pattern to all requests I agree it should be an admin >>>> config then. >>>>> Updated the KIP with this now. >>>>> >>>>>> DJ3: I assume that the Admin client will transparently handle the >>>> change. >>>>>> It would be great to call it out in the compatibility section. >>>>> >>>>> Updated, Old clients will still have the old behaviour of no >> pagingation >>>> while all new versions will support this. >>>>> >>>>> Thanks >>>>> Omnia >>>>> >>>>>> On 2 Jul 2024, at 10:43, David Jacot <dja...@confluent.io.INVALID> >>>> wrote: >>>>>> >>>>>> Hi Omnia, >>>>>> >>>>>> Thanks for the KIP. I agree that we should migrate admin APIs to the >> new >>>>>> pattern. >>>>>> >>>>>> DJ1: Why do we want to migrate only a subset of the APIs vs migrating >>>> all >>>>>> of them? For instance, there are DescribeGroups, >> ConsumerGroupDescribe, >>>>>> etc. Do we have reasons not to migrate them too? I think that it would >>>> be >>>>>> great to have a KIP that establishes the pattern for all the admin >> APIs. >>>>>> >>>>>> DJ2: I am not a fan of all the new parameters passed to the tools >>>>>> (e.g. --partition-size-limit-per-response). I wonder if we could >> rather >>>>>> have a config in the admin client to set the default page size for all >>>>>> requests. >>>>>> >>>>>> DJ3: I assume that the Admin client will transparently handle the >>>> change. >>>>>> It would be great to call it out in the compatibility section. >>>>>> >>>>>> Thanks, >>>>>> David >>>>>> >>>>>> On Tue, Jul 2, 2024 at 11:17 AM Andrew Schofield < >>>> andrew_schofi...@live.com> >>>>>> wrote: >>>>>> >>>>>>> Hi, >>>>>>> Thanks for the response. Makes sense to me. Just one additional >>>> comment: >>>>>>> >>>>>>> AS5: The cursor for ListGroupsResponse is called >> `TransactionalCursor` >>>>>>> which >>>>>>> seems like a copy-paste mistake. >>>>>>> >>>>>>> Thanks, >>>>>>> Andrew >>>>>>> >>>>>>>> On 30 Jun 2024, at 22:28, Omnia Ibrahim <o.g.h.ibra...@gmail.com> >>>> wrote: >>>>>>>> >>>>>>>> Hi Andrew thanks for having a look into the KIP >>>>>>>> >>>>>>>>> AS1: Besides topics, the most numerous resources in Kafka clusters >> in >>>>>>> my experience >>>>>>>>> are consumer groups. Would it be possible to extend the KIP to >> cover >>>>>>> ListGroups while >>>>>>>>> you’re in here? I’ve heard of clusters with truly vast numbers of >>>>>>> groups. This is also >>>>>>>>> potentially a sign of a misbehaving or poorly written clients. >>>> Getting >>>>>>> a page of groups >>>>>>>>> with a massive ItemsLeftToFetch would be nice. >>>>>>>> Yes, I also had few experiences with large cluster where to list >>>>>>> consumer groups can take up to 5min. I update the KIP to include this >>>> as >>>>>>> well. >>>>>>>> >>>>>>>>> AS2: A tiny nit: The versions for the added fields are incorrect in >>>>>>> some cases. >>>>>>>> I believe I fixed all of them now >>>>>>>> >>>>>>>>> AS3: I don’t quite understand the cursor for >>>>>>> OffsetFetchRequest/Response. >>>>>>>>> It looks like the cursor is (topic, partition), but not group ID. >>>> Does >>>>>>> the cursor >>>>>>>>> apply to all groups in the request, or is group ID missing? >>>>>>>> >>>>>>>> I was thinking that the last one in the response will be the one >> that >>>>>>> has the cursor while the rest will have null. But if we are moving >>>>>>> NextCursour to the top level of the response then the cursor will >> need >>>>>>> groupID. >>>>>>>>> AS4: For the remaining request/response pairs, the cursor makes >> sense >>>>>>> to me, >>>>>>>>> but I do wonder whether `NextCursor` should be at the top level of >>>> the >>>>>>> responses >>>>>>>>> instead, like DescribeTopicPartitionsResponse. >>>>>>>> >>>>>>>> Updates the KIP to reflect this now. >>>>>>>> >>>>>>>> Let me know if you have any more feedback on this. >>>>>>>> >>>>>>>> Best >>>>>>>> Omnia >>>>>>>> >>>>>>>>> On 27 Jun 2024, at 17:53, Andrew Schofield < >>>> andrew_schofi...@live.com> >>>>>>> wrote: >>>>>>>>> >>>>>>>>> Hi Omnia, >>>>>>>>> Thanks for the KIP. This is a really nice improvement for >>>> administering >>>>>>> large clusters. >>>>>>>>> >>>>>>>>> AS1: Besides topics, the most numerous resources in Kafka clusters >> in >>>>>>> my experience >>>>>>>>> are consumer groups. Would it be possible to extend the KIP to >> cover >>>>>>> ListGroups while >>>>>>>>> you’re in here? I’ve heard of clusters with truly vast numbers of >>>>>>> groups. This is also >>>>>>>>> potentially a sign of a misbehaving or poorly written clients. >>>> Getting >>>>>>> a page of groups >>>>>>>>> with a massive ItemsLeftToFetch would be nice. >>>>>>>>> >>>>>>>>> AS2: A tiny nit: The versions for the added fields are incorrect in >>>>>>> some cases. >>>>>>>>> >>>>>>>>> AS3: I don’t quite understand the cursor for >>>>>>> OffsetFetchRequest/Response. >>>>>>>>> It looks like the cursor is (topic, partition), but not group ID. >>>> Does >>>>>>> the cursor >>>>>>>>> apply to all groups in the request, or is group ID missing? >>>>>>>>> >>>>>>>>> AS4: For the remaining request/response pairs, the cursor makes >> sense >>>>>>> to me, >>>>>>>>> but I do wonder whether `NextCursor` should be at the top level of >>>> the >>>>>>> responses >>>>>>>>> instead, like DescribeTopicPartitionsResponse. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Andrew >>>>>>>>> >>>>>>>>>> On 27 Jun 2024, at 14:05, Omnia Ibrahim <o.g.h.ibra...@gmail.com> >>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> Hi everyone, I would like to start a discussion thread for >> KIP-1062 >>>>>>>>>> >>>>>>> >>>> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1062%3A+Introduce+Pagination+for+some+requests+used+by+Admin+API >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Thanks >>>>>>>>>> Omnia >>>>>>> >>>>>>> >>>>>>> >>>>> >>>> >>>> >>> >>> -- >>> David Arthur >> >> > > -- > David Arthur