Re: Pagination for List APIs in the REST spec

Micah Kornfield Tue, 19 Dec 2023 07:51:24 -0800

I tried to cover these in more details at:
https://docs.google.com/document/d/1bbfoLssY1szCO_Hm3_93ZcN0UAMpf7kjmpwHQngqQJ0/edit


On Sun, Dec 17, 2023 at 6:07 PM Renjie Liu <liurenjie2...@gmail.com> wrote:

> +1 for this approach. I agree that the streaming approach requires that
> http client and servers have http 2 streaming support, which is not
> compatible with old clients.
>
> I share the same concern with Micah that only start/limit may not be
> enough in a distributed environment where modification happens during
> iterations. For compatibility, we need to consider several cases:
>
> 1. Old client <-> New Server
> 2. New client <-> Old server
>
>
>
> On Sat, Dec 16, 2023 at 6:51 AM Daniel Weeks <dwe...@apache.org> wrote:
>
>> I agree that we want to include this feature and I raised similar
>> concerns to what Micah already presented in talking with Ryan.
>>
>> For backward compatibility, just adding a start and limit implies a
>> deterministic order, which is not a current requirement of the REST spec.
>>
>> Also, we need to consider whether the start/limit would need to be
>> respected by the server.  If existing implementations simply return all the
>> results, will that be sufficient?  There are a few edge cases that need to
>> be considered here.
>>
>> For the opaque key approach, I think adding a query param to
>> trigger/continue and introducing a continuation token in
>> the ListNamespacesResponse might allow for more backward compatibility.  In
>> that scenario, pagination would only take place for clients who know how to
>> paginate and the ordering would not need to be deterministic.
>>
>> -Dan
>>
>> On Fri, Dec 15, 2023, 10:33 AM Micah Kornfield <emkornfi...@gmail.com>
>> wrote:
>>
>>> Just to clarify and add a small suggestion:
>>>
>>> The behavior with no additional parameters requires the operations to
>>> happen as they do today for backwards compatibility (i.e either all
>>> responses are returned or a failure occurs).
>>>
>>> For new parameters, I'd suggest an opaque start token (instead of
>>> specific numeric offset) that can be returned by the service and a limit
>>> (as proposed above). If a start token is provided without a limit a
>>> default limit can be chosen by the server.  Servers might return less than
>>> limit (i.e. clients are required to check for a next token to determine if
>>> iteration is complete).  This enables server side state if it is desired
>>> but also makes deterministic listing much more feasible (deterministic
>>> responses are essentially impossible in the face of changing data if only a
>>> start offset is provided).
>>>
>>> In an ideal world, specifying a limit would result in streaming
>>> responses being returned with the last part either containing a token if
>>> continuation is necessary.  Given conversation on the other thread of
>>> streaming, I'd imagine this is quite hard to model in an Open API REST
>>> service.
>>>
>>> Therefore it seems like using pagination with token and offset would be
>>> preferred.  If skipping someplace in the middle of the namespaces is
>>> required then I would suggest modelling those as first class query
>>> parameters (e.g. "startAfterNamespace")
>>>
>>> Cheers,
>>> Micah
>>>
>>>
>>> On Fri, Dec 15, 2023 at 10:08 AM Ryan Blue <b...@tabular.io> wrote:
>>>
>>>> +1 for this approach
>>>>
>>>> I think it's good to use query params because it can be
>>>> backward-compatible with the current behavior. If you get more than the
>>>> limit back, then the service probably doesn't support pagination. And if a
>>>> client doesn't support pagination they get the same results that they would
>>>> today. A streaming approach with a continuation link like in the scan API
>>>> discussion wouldn't work because old clients don't know to make a second
>>>> request.
>>>>
>>>> On Thu, Dec 14, 2023 at 10:07 AM Jack Ye <yezhao...@gmail.com> wrote:
>>>>
>>>>> Hi everyone,
>>>>>
>>>>> During the conversation of the Scan API for REST spec, we touched on
>>>>> the topic of pagination when REST response is large or takes time to be
>>>>> produced.
>>>>>
>>>>> I just want to discuss this separately, since we also see the issue
>>>>> for ListNamespaces and ListTables/Views, when integrating with a large
>>>>> organization that has over 100k namespaces, and also a lot of tables in
>>>>> some namespaces.
>>>>>
>>>>> Pagination requires either keeping state, or the response to be
>>>>> deterministic such that the client can request a range of the full
>>>>> response. If we want to avoid keeping state, I think we need to allow some
>>>>> query parameters like:
>>>>> - *start*: the start index of the item in the response
>>>>> - *limit*: the number of items to be returned in the response
>>>>>
>>>>> So we can send a request like:
>>>>>
>>>>> *GET /namespaces?start=300&limit=100*
>>>>>
>>>>> *GET /namespaces/ns/tables?start=300&limit=100*
>>>>>
>>>>> And the REST spec should enforce that the response returned for the
>>>>> paginated GET should be deterministic.
>>>>>
>>>>> Any thoughts on this?
>>>>>
>>>>> Best,
>>>>> Jack Ye
>>>>>
>>>>>
>>>>
>>>> --
>>>> Ryan Blue
>>>> Tabular
>>>>
>>>

Re: Pagination for List APIs in the REST spec

Reply via email to