+1 for this approach. I agree that the streaming approach requires that http client and servers have http 2 streaming support, which is not compatible with old clients.
I share the same concern with Micah that only start/limit may not be enough in a distributed environment where modification happens during iterations. For compatibility, we need to consider several cases: 1. Old client <-> New Server 2. New client <-> Old server On Sat, Dec 16, 2023 at 6:51 AM Daniel Weeks <dwe...@apache.org> wrote: > I agree that we want to include this feature and I raised similar concerns > to what Micah already presented in talking with Ryan. > > For backward compatibility, just adding a start and limit implies a > deterministic order, which is not a current requirement of the REST spec. > > Also, we need to consider whether the start/limit would need to be > respected by the server. If existing implementations simply return all the > results, will that be sufficient? There are a few edge cases that need to > be considered here. > > For the opaque key approach, I think adding a query param to > trigger/continue and introducing a continuation token in > the ListNamespacesResponse might allow for more backward compatibility. In > that scenario, pagination would only take place for clients who know how to > paginate and the ordering would not need to be deterministic. > > -Dan > > On Fri, Dec 15, 2023, 10:33 AM Micah Kornfield <emkornfi...@gmail.com> > wrote: > >> Just to clarify and add a small suggestion: >> >> The behavior with no additional parameters requires the operations to >> happen as they do today for backwards compatibility (i.e either all >> responses are returned or a failure occurs). >> >> For new parameters, I'd suggest an opaque start token (instead of >> specific numeric offset) that can be returned by the service and a limit >> (as proposed above). If a start token is provided without a limit a >> default limit can be chosen by the server. Servers might return less than >> limit (i.e. clients are required to check for a next token to determine if >> iteration is complete). This enables server side state if it is desired >> but also makes deterministic listing much more feasible (deterministic >> responses are essentially impossible in the face of changing data if only a >> start offset is provided). >> >> In an ideal world, specifying a limit would result in streaming responses >> being returned with the last part either containing a token if continuation >> is necessary. Given conversation on the other thread of streaming, I'd >> imagine this is quite hard to model in an Open API REST service. >> >> Therefore it seems like using pagination with token and offset would be >> preferred. If skipping someplace in the middle of the namespaces is >> required then I would suggest modelling those as first class query >> parameters (e.g. "startAfterNamespace") >> >> Cheers, >> Micah >> >> >> On Fri, Dec 15, 2023 at 10:08 AM Ryan Blue <b...@tabular.io> wrote: >> >>> +1 for this approach >>> >>> I think it's good to use query params because it can be >>> backward-compatible with the current behavior. If you get more than the >>> limit back, then the service probably doesn't support pagination. And if a >>> client doesn't support pagination they get the same results that they would >>> today. A streaming approach with a continuation link like in the scan API >>> discussion wouldn't work because old clients don't know to make a second >>> request. >>> >>> On Thu, Dec 14, 2023 at 10:07 AM Jack Ye <yezhao...@gmail.com> wrote: >>> >>>> Hi everyone, >>>> >>>> During the conversation of the Scan API for REST spec, we touched on >>>> the topic of pagination when REST response is large or takes time to be >>>> produced. >>>> >>>> I just want to discuss this separately, since we also see the issue for >>>> ListNamespaces and ListTables/Views, when integrating with a large >>>> organization that has over 100k namespaces, and also a lot of tables in >>>> some namespaces. >>>> >>>> Pagination requires either keeping state, or the response to be >>>> deterministic such that the client can request a range of the full >>>> response. If we want to avoid keeping state, I think we need to allow some >>>> query parameters like: >>>> - *start*: the start index of the item in the response >>>> - *limit*: the number of items to be returned in the response >>>> >>>> So we can send a request like: >>>> >>>> *GET /namespaces?start=300&limit=100* >>>> >>>> *GET /namespaces/ns/tables?start=300&limit=100* >>>> >>>> And the REST spec should enforce that the response returned for the >>>> paginated GET should be deterministic. >>>> >>>> Any thoughts on this? >>>> >>>> Best, >>>> Jack Ye >>>> >>>> >>> >>> -- >>> Ryan Blue >>> Tabular >>> >>