I tried to cover these in more details at: https://docs.google.com/document/d/1bbfoLssY1szCO_Hm3_93ZcN0UAMpf7kjmpwHQngqQJ0/edit
On Sun, Dec 17, 2023 at 6:07 PM Renjie Liu <liurenjie2...@gmail.com> wrote: > +1 for this approach. I agree that the streaming approach requires that > http client and servers have http 2 streaming support, which is not > compatible with old clients. > > I share the same concern with Micah that only start/limit may not be > enough in a distributed environment where modification happens during > iterations. For compatibility, we need to consider several cases: > > 1. Old client <-> New Server > 2. New client <-> Old server > > > > On Sat, Dec 16, 2023 at 6:51 AM Daniel Weeks <dwe...@apache.org> wrote: > >> I agree that we want to include this feature and I raised similar >> concerns to what Micah already presented in talking with Ryan. >> >> For backward compatibility, just adding a start and limit implies a >> deterministic order, which is not a current requirement of the REST spec. >> >> Also, we need to consider whether the start/limit would need to be >> respected by the server. If existing implementations simply return all the >> results, will that be sufficient? There are a few edge cases that need to >> be considered here. >> >> For the opaque key approach, I think adding a query param to >> trigger/continue and introducing a continuation token in >> the ListNamespacesResponse might allow for more backward compatibility. In >> that scenario, pagination would only take place for clients who know how to >> paginate and the ordering would not need to be deterministic. >> >> -Dan >> >> On Fri, Dec 15, 2023, 10:33 AM Micah Kornfield <emkornfi...@gmail.com> >> wrote: >> >>> Just to clarify and add a small suggestion: >>> >>> The behavior with no additional parameters requires the operations to >>> happen as they do today for backwards compatibility (i.e either all >>> responses are returned or a failure occurs). >>> >>> For new parameters, I'd suggest an opaque start token (instead of >>> specific numeric offset) that can be returned by the service and a limit >>> (as proposed above). If a start token is provided without a limit a >>> default limit can be chosen by the server. Servers might return less than >>> limit (i.e. clients are required to check for a next token to determine if >>> iteration is complete). This enables server side state if it is desired >>> but also makes deterministic listing much more feasible (deterministic >>> responses are essentially impossible in the face of changing data if only a >>> start offset is provided). >>> >>> In an ideal world, specifying a limit would result in streaming >>> responses being returned with the last part either containing a token if >>> continuation is necessary. Given conversation on the other thread of >>> streaming, I'd imagine this is quite hard to model in an Open API REST >>> service. >>> >>> Therefore it seems like using pagination with token and offset would be >>> preferred. If skipping someplace in the middle of the namespaces is >>> required then I would suggest modelling those as first class query >>> parameters (e.g. "startAfterNamespace") >>> >>> Cheers, >>> Micah >>> >>> >>> On Fri, Dec 15, 2023 at 10:08 AM Ryan Blue <b...@tabular.io> wrote: >>> >>>> +1 for this approach >>>> >>>> I think it's good to use query params because it can be >>>> backward-compatible with the current behavior. If you get more than the >>>> limit back, then the service probably doesn't support pagination. And if a >>>> client doesn't support pagination they get the same results that they would >>>> today. A streaming approach with a continuation link like in the scan API >>>> discussion wouldn't work because old clients don't know to make a second >>>> request. >>>> >>>> On Thu, Dec 14, 2023 at 10:07 AM Jack Ye <yezhao...@gmail.com> wrote: >>>> >>>>> Hi everyone, >>>>> >>>>> During the conversation of the Scan API for REST spec, we touched on >>>>> the topic of pagination when REST response is large or takes time to be >>>>> produced. >>>>> >>>>> I just want to discuss this separately, since we also see the issue >>>>> for ListNamespaces and ListTables/Views, when integrating with a large >>>>> organization that has over 100k namespaces, and also a lot of tables in >>>>> some namespaces. >>>>> >>>>> Pagination requires either keeping state, or the response to be >>>>> deterministic such that the client can request a range of the full >>>>> response. If we want to avoid keeping state, I think we need to allow some >>>>> query parameters like: >>>>> - *start*: the start index of the item in the response >>>>> - *limit*: the number of items to be returned in the response >>>>> >>>>> So we can send a request like: >>>>> >>>>> *GET /namespaces?start=300&limit=100* >>>>> >>>>> *GET /namespaces/ns/tables?start=300&limit=100* >>>>> >>>>> And the REST spec should enforce that the response returned for the >>>>> paginated GET should be deterministic. >>>>> >>>>> Any thoughts on this? >>>>> >>>>> Best, >>>>> Jack Ye >>>>> >>>>> >>>> >>>> -- >>>> Ryan Blue >>>> Tabular >>>> >>>