+1 for this approach. I agree that the streaming approach requires that
http client and servers have http 2 streaming support, which is not
compatible with old clients.

I share the same concern with Micah that only start/limit may not be enough
in a distributed environment where modification happens during iterations.
For compatibility, we need to consider several cases:

1. Old client <-> New Server
2. New client <-> Old server



On Sat, Dec 16, 2023 at 6:51 AM Daniel Weeks <dwe...@apache.org> wrote:

> I agree that we want to include this feature and I raised similar concerns
> to what Micah already presented in talking with Ryan.
>
> For backward compatibility, just adding a start and limit implies a
> deterministic order, which is not a current requirement of the REST spec.
>
> Also, we need to consider whether the start/limit would need to be
> respected by the server.  If existing implementations simply return all the
> results, will that be sufficient?  There are a few edge cases that need to
> be considered here.
>
> For the opaque key approach, I think adding a query param to
> trigger/continue and introducing a continuation token in
> the ListNamespacesResponse might allow for more backward compatibility.  In
> that scenario, pagination would only take place for clients who know how to
> paginate and the ordering would not need to be deterministic.
>
> -Dan
>
> On Fri, Dec 15, 2023, 10:33 AM Micah Kornfield <emkornfi...@gmail.com>
> wrote:
>
>> Just to clarify and add a small suggestion:
>>
>> The behavior with no additional parameters requires the operations to
>> happen as they do today for backwards compatibility (i.e either all
>> responses are returned or a failure occurs).
>>
>> For new parameters, I'd suggest an opaque start token (instead of
>> specific numeric offset) that can be returned by the service and a limit
>> (as proposed above). If a start token is provided without a limit a
>> default limit can be chosen by the server.  Servers might return less than
>> limit (i.e. clients are required to check for a next token to determine if
>> iteration is complete).  This enables server side state if it is desired
>> but also makes deterministic listing much more feasible (deterministic
>> responses are essentially impossible in the face of changing data if only a
>> start offset is provided).
>>
>> In an ideal world, specifying a limit would result in streaming responses
>> being returned with the last part either containing a token if continuation
>> is necessary.  Given conversation on the other thread of streaming, I'd
>> imagine this is quite hard to model in an Open API REST service.
>>
>> Therefore it seems like using pagination with token and offset would be
>> preferred.  If skipping someplace in the middle of the namespaces is
>> required then I would suggest modelling those as first class query
>> parameters (e.g. "startAfterNamespace")
>>
>> Cheers,
>> Micah
>>
>>
>> On Fri, Dec 15, 2023 at 10:08 AM Ryan Blue <b...@tabular.io> wrote:
>>
>>> +1 for this approach
>>>
>>> I think it's good to use query params because it can be
>>> backward-compatible with the current behavior. If you get more than the
>>> limit back, then the service probably doesn't support pagination. And if a
>>> client doesn't support pagination they get the same results that they would
>>> today. A streaming approach with a continuation link like in the scan API
>>> discussion wouldn't work because old clients don't know to make a second
>>> request.
>>>
>>> On Thu, Dec 14, 2023 at 10:07 AM Jack Ye <yezhao...@gmail.com> wrote:
>>>
>>>> Hi everyone,
>>>>
>>>> During the conversation of the Scan API for REST spec, we touched on
>>>> the topic of pagination when REST response is large or takes time to be
>>>> produced.
>>>>
>>>> I just want to discuss this separately, since we also see the issue for
>>>> ListNamespaces and ListTables/Views, when integrating with a large
>>>> organization that has over 100k namespaces, and also a lot of tables in
>>>> some namespaces.
>>>>
>>>> Pagination requires either keeping state, or the response to be
>>>> deterministic such that the client can request a range of the full
>>>> response. If we want to avoid keeping state, I think we need to allow some
>>>> query parameters like:
>>>> - *start*: the start index of the item in the response
>>>> - *limit*: the number of items to be returned in the response
>>>>
>>>> So we can send a request like:
>>>>
>>>> *GET /namespaces?start=300&limit=100*
>>>>
>>>> *GET /namespaces/ns/tables?start=300&limit=100*
>>>>
>>>> And the REST spec should enforce that the response returned for the
>>>> paginated GET should be deterministic.
>>>>
>>>> Any thoughts on this?
>>>>
>>>> Best,
>>>> Jack Ye
>>>>
>>>>
>>>
>>> --
>>> Ryan Blue
>>> Tabular
>>>
>>

Reply via email to