Omnia, thanks for the updates! > Am happy to add section for throttling in this KIP if it is high concern or open a followup KIP for this once we already have the pagination in place. Which one do you suggest?
I'm okay leaving throttling for a future KIP. It might be useful to see the feature in action for a while before deciding if its necessary or the best way to approach it. On Mon, Jul 22, 2024 at 9:23 AM Omnia Ibrahim <o.g.h.ibra...@gmail.com> wrote: > > Hi David, thanks for the feedback and sorry for taking long to respond as > I was off for a week. > > DA1: In "Public Interfaces" you say "max.request.pagination.size.limit" > > controls the max items to return by default. It's not clear to me if this > > is just a default, or if it is a hard limit. In KIP-966, this config > serves > > as a hard limit to prevent misconfigured or malicious clients from > > requesting too many resources. Can you clarify this bit? > > `max.request.partition.size.limit` will be used in same way as KIP-966 I > just meant `max.request.partition.size.limit` will equal > `max.request.pagination.size.limit` by default unless it is specified > otherwise. I clarified this in the KIP now > > > DA2: Is "ItemsLeftToFetch" accounting for authorization? If not, it could > > be considered a minor info leak. > > This is a good point. Any of the requests still will count to what ACLs > and resources the authorised user is used by the client, the pagination > will not effect this. > In cases where the client is using user with wild ACLs I am assuming this > is okay and they have the right to see this info. > However am rethinking this now as it might not be that useful and we can > just relay on if the there is a next cursor or not to simplify the approach > similar to KIP-966. I have updated the KIP to reflect this. > > > DA3: By splitting up results into pages, we are introducing the > possibility > > of inconsistency into these RPCs. For example, today MetadataRequest > > returns results from the same MetadataImage, so the response is > consistent. > > With the paging approach, it is possible (likely even) that different > > requests will be served from different MetadataImage-s, leading to > > inconsistencies. This can be even worse if paged requests go to different > > brokers that may be lagging behind in metadata propagation. BTW this > issue > > exists for KIP-966 as well. We don't necessarily need to solve this right > > away, but I think it's worth mentioning in the KIP. > > I added a limitation section to the KIP to mention this. I also mentioned > it in the top section of public interfaces. > > > DA4: Have we considered some generic throttling for paged requests? I > > expect it might be an issue if clients want to get everything and just > page > > through all of the results as quickly as possible. > I didn’t consider throttling for pagination requests as > Right now the only throttling AdminClient knows is throttling > TopicCreate/Delete which is different than pagination and might need it is > own conversation and KIP. > For example in the case of throttling and retries > timeouts, should > consider send back what we fetched so far and allow the operator to set the > cursor next time. If this is the case then we need to include cursor to all > the Option classes to these requests. Also Admin API for > DescribeTopicPartitionRequest in KIP-966 don’t provide Cursor as part of > DescribeTopicsOptions. > Also extending `controllerMutation` or should we separate the paging > throttling to its own quota > The only requests I think might actively scraped are `OffsetFetchRequest`, > `ListGroupsRequest`, `DescribeGroupsRequest` and > `ConsumerGroupDescribeRequest` to actively provide lag metrics/dashboards > to consumers. So there might be too many pages. > The rest of the requests mostly used during maintenance of the cluster or > incidents (specially the producer/txn requests) and operator of the cluster > need them to take a decision. The pagination just provides them with a way > to escape the timeout problem with large clusters. So am not sure adding > throttling during such time would be wise. > Am happy to add section for throttling in this KIP if it is high concern > or open a followup KIP for this once we already have the pagination in > place. Which one do you suggest? > > Thanks > Omnia > > > On 12 Jul 2024, at 14:56, David Arthur <mum...@gmail.com> wrote: > > > > Hey Omnia, thanks for the KIP! I think this will be a really nice > > improvement for operators. > > > > DA1: In "Public Interfaces" you say "max.request.pagination.size.limit" > > controls the max items to return by default. It's not clear to me if this > > is just a default, or if it is a hard limit. In KIP-966, this config > serves > > as a hard limit to prevent misconfigured or malicious clients from > > requesting too many resources. Can you clarify this bit? > > > > DA2: Is "ItemsLeftToFetch" accounting for authorization? If not, it could > > be considered a minor info leak. > > > > DA3: By splitting up results into pages, we are introducing the > possibility > > of inconsistency into these RPCs. For example, today MetadataRequest > > returns results from the same MetadataImage, so the response is > consistent. > > With the paging approach, it is possible (likely even) that different > > requests will be served from different MetadataImage-s, leading to > > inconsistencies. This can be even worse if paged requests go to different > > brokers that may be lagging behind in metadata propagation. BTW this > issue > > exists for KIP-966 as well. We don't necessarily need to solve this right > > away, but I think it's worth mentioning in the KIP. > > > > DA4: Have we considered some generic throttling for paged requests? I > > expect it might be an issue if clients want to get everything and just > page > > through all of the results as quickly as possible. > > > > Cheers, > > David > > > > > > On Tue, Jul 9, 2024 at 8:20 AM Omnia Ibrahim <o.g.h.ibra...@gmail.com> > > wrote: > > > >> Hey, > >> I will open a voting thread by end of this week if there is no more > >> feedbacks. > >> > >> Thanks > >> Omnia > >> > >>> On 2 Jul 2024, at 17:25, Omnia Ibrahim <o.g.h.ibra...@gmail.com> > wrote: > >>> > >>> Hi David, Thanks for having a look into this! > >>> > >>>> DJ1: Why do we want to migrate only a subset of the APIs vs migrating > >> all > >>>> of them? For instance, there are DescribeGroups, > ConsumerGroupDescribe, > >>>> etc. Do we have reasons not to migrate them too? I think that it would > >> be > >>>> great to have a KIP that establishes the pattern for all the admin > APIs. > >>> > >>> No particular reason I only was trying to focus on the requests that > >> cause the most pain ( as far as I know with Kafka). I don’t mind expand > it > >> to other requests in AdminClient as well. > >>> The other thing I was wondering about if > >> max.request.partition.size.limit still make sense or should it be > renamed > >> (or have another one called max.request.pagingation.size.limit) if we > going > >> to expand to cover all Admin requests. I am leaning toward > >> max.request.pagingation.size.limit and max.request.partition.size.limit > can > >> be set for now to max.request.partition.size.limit by default. > >>> > >>> I have updated the KIP now to cover more requests please have a second > >> look and let me know if I missed any? > >>> > >>>> DJ2: I am not a fan of all the new parameters passed to the tools > >>>> (e.g. --partition-size-limit-per-response). I wonder if we could > rather > >>>> have a config in the admin client to set the default page size for all > >>>> requests. > >>> > >>> If it going to be pattern to all requests I agree it should be an admin > >> config then. > >>> Updated the KIP with this now. > >>> > >>>> DJ3: I assume that the Admin client will transparently handle the > >> change. > >>>> It would be great to call it out in the compatibility section. > >>> > >>> Updated, Old clients will still have the old behaviour of no > pagingation > >> while all new versions will support this. > >>> > >>> Thanks > >>> Omnia > >>> > >>>> On 2 Jul 2024, at 10:43, David Jacot <dja...@confluent.io.INVALID> > >> wrote: > >>>> > >>>> Hi Omnia, > >>>> > >>>> Thanks for the KIP. I agree that we should migrate admin APIs to the > new > >>>> pattern. > >>>> > >>>> DJ1: Why do we want to migrate only a subset of the APIs vs migrating > >> all > >>>> of them? For instance, there are DescribeGroups, > ConsumerGroupDescribe, > >>>> etc. Do we have reasons not to migrate them too? I think that it would > >> be > >>>> great to have a KIP that establishes the pattern for all the admin > APIs. > >>>> > >>>> DJ2: I am not a fan of all the new parameters passed to the tools > >>>> (e.g. --partition-size-limit-per-response). I wonder if we could > rather > >>>> have a config in the admin client to set the default page size for all > >>>> requests. > >>>> > >>>> DJ3: I assume that the Admin client will transparently handle the > >> change. > >>>> It would be great to call it out in the compatibility section. > >>>> > >>>> Thanks, > >>>> David > >>>> > >>>> On Tue, Jul 2, 2024 at 11:17 AM Andrew Schofield < > >> andrew_schofi...@live.com> > >>>> wrote: > >>>> > >>>>> Hi, > >>>>> Thanks for the response. Makes sense to me. Just one additional > >> comment: > >>>>> > >>>>> AS5: The cursor for ListGroupsResponse is called > `TransactionalCursor` > >>>>> which > >>>>> seems like a copy-paste mistake. > >>>>> > >>>>> Thanks, > >>>>> Andrew > >>>>> > >>>>>> On 30 Jun 2024, at 22:28, Omnia Ibrahim <o.g.h.ibra...@gmail.com> > >> wrote: > >>>>>> > >>>>>> Hi Andrew thanks for having a look into the KIP > >>>>>> > >>>>>>> AS1: Besides topics, the most numerous resources in Kafka clusters > in > >>>>> my experience > >>>>>>> are consumer groups. Would it be possible to extend the KIP to > cover > >>>>> ListGroups while > >>>>>>> you’re in here? I’ve heard of clusters with truly vast numbers of > >>>>> groups. This is also > >>>>>>> potentially a sign of a misbehaving or poorly written clients. > >> Getting > >>>>> a page of groups > >>>>>>> with a massive ItemsLeftToFetch would be nice. > >>>>>> Yes, I also had few experiences with large cluster where to list > >>>>> consumer groups can take up to 5min. I update the KIP to include this > >> as > >>>>> well. > >>>>>> > >>>>>>> AS2: A tiny nit: The versions for the added fields are incorrect in > >>>>> some cases. > >>>>>> I believe I fixed all of them now > >>>>>> > >>>>>>> AS3: I don’t quite understand the cursor for > >>>>> OffsetFetchRequest/Response. > >>>>>>> It looks like the cursor is (topic, partition), but not group ID. > >> Does > >>>>> the cursor > >>>>>>> apply to all groups in the request, or is group ID missing? > >>>>>> > >>>>>> I was thinking that the last one in the response will be the one > that > >>>>> has the cursor while the rest will have null. But if we are moving > >>>>> NextCursour to the top level of the response then the cursor will > need > >>>>> groupID. > >>>>>>> AS4: For the remaining request/response pairs, the cursor makes > sense > >>>>> to me, > >>>>>>> but I do wonder whether `NextCursor` should be at the top level of > >> the > >>>>> responses > >>>>>>> instead, like DescribeTopicPartitionsResponse. > >>>>>> > >>>>>> Updates the KIP to reflect this now. > >>>>>> > >>>>>> Let me know if you have any more feedback on this. > >>>>>> > >>>>>> Best > >>>>>> Omnia > >>>>>> > >>>>>>> On 27 Jun 2024, at 17:53, Andrew Schofield < > >> andrew_schofi...@live.com> > >>>>> wrote: > >>>>>>> > >>>>>>> Hi Omnia, > >>>>>>> Thanks for the KIP. This is a really nice improvement for > >> administering > >>>>> large clusters. > >>>>>>> > >>>>>>> AS1: Besides topics, the most numerous resources in Kafka clusters > in > >>>>> my experience > >>>>>>> are consumer groups. Would it be possible to extend the KIP to > cover > >>>>> ListGroups while > >>>>>>> you’re in here? I’ve heard of clusters with truly vast numbers of > >>>>> groups. This is also > >>>>>>> potentially a sign of a misbehaving or poorly written clients. > >> Getting > >>>>> a page of groups > >>>>>>> with a massive ItemsLeftToFetch would be nice. > >>>>>>> > >>>>>>> AS2: A tiny nit: The versions for the added fields are incorrect in > >>>>> some cases. > >>>>>>> > >>>>>>> AS3: I don’t quite understand the cursor for > >>>>> OffsetFetchRequest/Response. > >>>>>>> It looks like the cursor is (topic, partition), but not group ID. > >> Does > >>>>> the cursor > >>>>>>> apply to all groups in the request, or is group ID missing? > >>>>>>> > >>>>>>> AS4: For the remaining request/response pairs, the cursor makes > sense > >>>>> to me, > >>>>>>> but I do wonder whether `NextCursor` should be at the top level of > >> the > >>>>> responses > >>>>>>> instead, like DescribeTopicPartitionsResponse. > >>>>>>> > >>>>>>> Thanks, > >>>>>>> Andrew > >>>>>>> > >>>>>>>> On 27 Jun 2024, at 14:05, Omnia Ibrahim <o.g.h.ibra...@gmail.com> > >>>>> wrote: > >>>>>>>> > >>>>>>>> Hi everyone, I would like to start a discussion thread for > KIP-1062 > >>>>>>>> > >>>>> > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1062%3A+Introduce+Pagination+for+some+requests+used+by+Admin+API > >>>>>>>> > >>>>>>>> > >>>>>>>> Thanks > >>>>>>>> Omnia > >>>>> > >>>>> > >>>>> > >>> > >> > >> > > > > -- > > David Arthur > > -- David Arthur