You are right, thanks Jack. On Mon, May 20, 2024 at 8:06 AM Jack Ye <yezhao...@gmail.com> wrote:
> I believe this is already merged? > https://github.com/apache/iceberg/pull/9782 > > Best, > Jack Ye > > On Sat, May 18, 2024 at 4:06 PM Pucheng Yang <py...@pinterest.com.invalid> > wrote: > >> Hi all, is there an ETA for this? thanks >> >> On Wed, Dec 20, 2023 at 6:03 PM Renjie Liu <liurenjie2...@gmail.com> >> wrote: >> >>> I think if servers provide a meaningful error message on expiration >>>> hopefully, this would be a good first step in debugging. I think saying >>>> tokens should generally support O(Minutes) at least should cover most >>>> use-cases? >>>> >>> >>> Sounds reasonable to me. Clients just need to be aware that the token is >>> for transient usage and should not store it for too long. >>> >>> On Thu, Dec 21, 2023 at 8:43 AM Micah Kornfield <emkornfi...@gmail.com> >>> wrote: >>> >>>> Overall, I don't think it's a good idea to add parallel listing for >>>>> things like tables and namespaces as it just adds complexity for an >>>>> incredibly narrow (and possibly poorly designed) use case. >>>> >>>> >>>> +1 I think that there are likely a few ways parallelization of table >>>> and namespace listing can be incorporated in the future into the API if >>>> necessary. >>>> >>>> I think the one place where parallelization is important immediately is >>>> for Planning, but that is already a separate thread. Apologies if I >>>> forked the conversation too far from that. >>>> >>>> On Wed, Dec 20, 2023 at 4:06 PM Daniel Weeks <dwe...@apache.org> wrote: >>>> >>>>> Overall, I don't think it's a good idea to add parallel listing for >>>>> things like tables and namespaces as it just adds complexity for an >>>>> incredibly narrow (and possibly poorly designed) use case. >>>>> >>>>> I feel we should leave it up to the server to define whether it will >>>>> provide consistency across paginated listing and avoid >>>>> bleeding time-travel like concepts (like 'asOf') into the API. I really >>>>> just don't see what practical value it provides as there are no explicit >>>>> or >>>>> consistently held guarantees around these operations. >>>>> >>>>> I'd agree with Micah's argument that if the server does provide >>>>> stronger guarantees, it should manage those via the opaque token and >>>>> respond with meaningful errors if it cannot satisfy the internal >>>>> constraints it imposes (like timeouts). >>>>> >>>>> It would help to have articulable use cases to really invest in more >>>>> complexity in this area and I feel like we're drifting a little into the >>>>> speculative at this point. >>>>> >>>>> -Dan >>>>> >>>>> >>>>> >>>>> On Wed, Dec 20, 2023 at 3:27 PM Micah Kornfield <emkornfi...@gmail.com> >>>>> wrote: >>>>> >>>>>> I agree that this is not quite useful for clients at this moment. But >>>>>>> I'm thinking that maybe exposing this will help debugging or diagnosing, >>>>>>> user just need to be aware of this potential expiration. >>>>>> >>>>>> >>>>>> I think if servers provide a meaningful error message on expiration >>>>>> hopefully, this would be a good first step in debugging. I think saying >>>>>> tokens should generally support O(Minutes) at least should cover most >>>>>> use-cases? >>>>>> >>>>>> On Tue, Dec 19, 2023 at 9:18 PM Renjie Liu <liurenjie2...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> If we choose to manage state on the server side, I recommend not >>>>>>>> revealing the expiration time to the client, at least not for now. We >>>>>>>> can >>>>>>>> introduce it when there's a practical need. It wouldn't constitute a >>>>>>>> breaking change, would it? >>>>>>> >>>>>>> >>>>>>> I agree that this is not quite useful for clients at this moment. >>>>>>> But I'm thinking that maybe exposing this will help debugging or >>>>>>> diagnosing, user just need to be aware of this potential expiration. >>>>>>> >>>>>>> On Wed, Dec 20, 2023 at 11:09 AM Xuanwo <xua...@apache.org> wrote: >>>>>>> >>>>>>>> > For the continuation token, I think one missing part is about the >>>>>>>> expiration time of this token, since this may affect the state cleaning >>>>>>>> process of the server. >>>>>>>> >>>>>>>> Some storage services use a continuation token as a binary >>>>>>>> representation of internal states. For example, they serialize a >>>>>>>> structure >>>>>>>> into binary and then perform base64 encoding. Services don't need to >>>>>>>> maintain state, eliminating the need for state cleaning. >>>>>>>> >>>>>>>> > Do servers need to expose the expiration time to clients? >>>>>>>> >>>>>>>> If we choose to manage state on the server side, I recommend not >>>>>>>> revealing the expiration time to the client, at least not for now. We >>>>>>>> can >>>>>>>> introduce it when there's a practical need. It wouldn't constitute a >>>>>>>> breaking change, would it? >>>>>>>> >>>>>>>> On Wed, Dec 20, 2023, at 10:57, Renjie Liu wrote: >>>>>>>> >>>>>>>> For the continuation token, I think one missing part is about the >>>>>>>> expiration time of this token, since this may affect the state >>>>>>>> cleaning process of the server. There are several things to discuss: >>>>>>>> >>>>>>>> 1. Should we leave it to the server to decide it or allow the >>>>>>>> client to config in api? >>>>>>>> >>>>>>>> Personally I think it would be enough for the server to determine >>>>>>>> it for now, since I don't see any usage to allow clients to set the >>>>>>>> expiration time in api. >>>>>>>> >>>>>>>> 2. Do servers need to expose the expiration time to clients? >>>>>>>> >>>>>>>> Personally I think it would be enough to expose this through the >>>>>>>> getConfig api to let users know this. For now there is no requirement >>>>>>>> for >>>>>>>> per request expiration time. >>>>>>>> >>>>>>>> On Wed, Dec 20, 2023 at 2:49 AM Micah Kornfield < >>>>>>>> emkornfi...@gmail.com> wrote: >>>>>>>> >>>>>>>> IMO, parallelization needs to be a first class entity in the end >>>>>>>> point/service design to allow for flexibility (I scanned through the >>>>>>>> original proposal for the scan planning and it looked like it was on >>>>>>>> the >>>>>>>> right track). Using offsets for parallelization is problematic from >>>>>>>> both a >>>>>>>> consistency and scalability perspective if you want to allow for >>>>>>>> flexibility in implementation. >>>>>>>> >>>>>>>> In particular, I think the server needs an APIs like: >>>>>>>> >>>>>>>> DoScan - returns a list of partitions (represented by an opaque >>>>>>>> entity). The list of partitions should support pagination (in an ideal >>>>>>>> world, it would be streaming). >>>>>>>> GetTasksForPartition - Returns scan tasks for a partition (should >>>>>>>> also be paginated/streaming, but this is up for debate). I think it >>>>>>>> is an >>>>>>>> important consideration to allow for empty partitions. >>>>>>>> >>>>>>>> With this implementation you don't necessarily require separate >>>>>>>> server side state (objects in GCS should be sufficient), I think as >>>>>>>> Ryan >>>>>>>> suggested, one implementation could be to have each partition >>>>>>>> correspond to >>>>>>>> a byte-range in a manifest file for returning the tasks. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Micah >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Dec 19, 2023 at 9:55 AM Walaa Eldin Moustafa < >>>>>>>> wa.moust...@gmail.com> wrote: >>>>>>>> >>>>>>>> Not necessarily. That is more of a general statement. The >>>>>>>> pagination discussion forked from server side scan planning. >>>>>>>> >>>>>>>> On Tue, Dec 19, 2023 at 9:52 AM Ryan Blue <b...@tabular.io> wrote: >>>>>>>> >>>>>>>> > With start/limit each client can query for own's chunk without >>>>>>>> coordination. >>>>>>>> >>>>>>>> Okay, I understand now. Would you need to parallelize the client >>>>>>>> for listing namespaces or tables? That seems odd to me. >>>>>>>> >>>>>>>> On Tue, Dec 19, 2023 at 9:48 AM Walaa Eldin Moustafa < >>>>>>>> wa.moust...@gmail.com> wrote: >>>>>>>> >>>>>>>> > You can parallelize with opaque tokens by sending a starting >>>>>>>> point for the next request. >>>>>>>> >>>>>>>> I meant we would have to wait for the server to return this >>>>>>>> starting point from the past request? With start/limit each client can >>>>>>>> query for own's chunk without coordination. >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Dec 19, 2023 at 9:44 AM Ryan Blue <b...@tabular.io> wrote: >>>>>>>> >>>>>>>> > I think start and offset has the advantage of being >>>>>>>> parallelizable (as compared to continuation tokens). >>>>>>>> >>>>>>>> You can parallelize with opaque tokens by sending a starting point >>>>>>>> for the next request. >>>>>>>> >>>>>>>> > On the other hand, using "asOf" can be complex to implement and >>>>>>>> may be too powerful for the pagination use case >>>>>>>> >>>>>>>> I don't think that we want to add `asOf`. If the service chooses to >>>>>>>> do this, it would send a continuation token that has the >>>>>>>> information embedded. >>>>>>>> >>>>>>>> On Tue, Dec 19, 2023 at 9:42 AM Walaa Eldin Moustafa < >>>>>>>> wa.moust...@gmail.com> wrote: >>>>>>>> >>>>>>>> Can we assume it is the responsibility of the server to ensure >>>>>>>> determinism (e.g., by caching the results along with query ID)? I think >>>>>>>> start and offset has the advantage of being parallelizable (as >>>>>>>> compared to >>>>>>>> continuation tokens). On the other hand, using "asOf" can be complex to >>>>>>>> implement and may be too powerful for the pagination use case >>>>>>>> (because it >>>>>>>> allows to query the warehouse as of any point of time, not just now). >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Walaa. >>>>>>>> >>>>>>>> On Tue, Dec 19, 2023 at 9:40 AM Ryan Blue <b...@tabular.io> wrote: >>>>>>>> >>>>>>>> I think you can solve the atomicity problem with a >>>>>>>> continuation token and server-side state. In general, I don't think >>>>>>>> this is >>>>>>>> a problem we should worry about a lot since pagination commonly has >>>>>>>> this >>>>>>>> problem. But since we can build a system that allows you to solve it >>>>>>>> if you >>>>>>>> choose to, we should go with that design. >>>>>>>> >>>>>>>> On Tue, Dec 19, 2023 at 9:13 AM Micah Kornfield < >>>>>>>> emkornfi...@gmail.com> wrote: >>>>>>>> >>>>>>>> Hi Jack, >>>>>>>> Some answers inline. >>>>>>>> >>>>>>>> >>>>>>>> In addition to the start index approach, another potential simple >>>>>>>> way to implement the continuation token is to use the last item name, >>>>>>>> when >>>>>>>> the listing is guaranteed to be in lexicographic order. >>>>>>>> >>>>>>>> >>>>>>>> I think this is one viable implementation, but the reason that the >>>>>>>> token should be opaque is that it allows several different >>>>>>>> implementations >>>>>>>> without client side changes. >>>>>>>> >>>>>>>> For example, if an element is added before the continuation token, >>>>>>>> then all future listing calls with the token would always skip that >>>>>>>> element. >>>>>>>> >>>>>>>> >>>>>>>> IMO, I think this is fine, for some of the REST APIs it is likely >>>>>>>> important to put constraints on atomicity requirements, for others >>>>>>>> (e.g. >>>>>>>> list namespaces) I think it is OK to have looser requirements. >>>>>>>> >>>>>>>> If we want to enforce that level of atomicity, we probably want to >>>>>>>> introduce another time travel query parameter (e.g. >>>>>>>> asOf=1703003028000) to >>>>>>>> ensure that we are listing results at a specific point of time of the >>>>>>>> warehouse, so the complete result list is fixed. >>>>>>>> >>>>>>>> >>>>>>>> Time travel might be useful in some cases but I think it is >>>>>>>> orthogonal to services wishing to have guarantees around >>>>>>>> atomicity/consistency of results. If a server wants to ensure that >>>>>>>> results >>>>>>>> are atomic/consistent as of the start of the listing, it can embed the >>>>>>>> necessary timestamp in the token it returns and parse it out when >>>>>>>> fetching >>>>>>>> the next result. >>>>>>>> >>>>>>>> I think this does raise a more general point around service >>>>>>>> definition evolution in general. I think there likely need to be >>>>>>>> metadata >>>>>>>> endpoints that expose either: >>>>>>>> 1. A version of the REST API supported. >>>>>>>> 2. Features the API supports (e.g. which query parameters are >>>>>>>> honored for a specific endpoint). >>>>>>>> >>>>>>>> There are pros and cons to both approaches (apologies if I missed >>>>>>>> this in the spec or if it has already been discussed). >>>>>>>> >>>>>>>> Cheers, >>>>>>>> Micah >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Dec 19, 2023 at 8:25 AM Jack Ye <yezhao...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>> Yes I agree that it is better to not enforce the implementation to >>>>>>>> favor any direction, and continuation token is probably better than >>>>>>>> enforcing a numeric start index. >>>>>>>> >>>>>>>> In addition to the start index approach, another potential simple >>>>>>>> way to implement the continuation token is to use the last item name, >>>>>>>> when >>>>>>>> the listing is guaranteed to be in lexicographic order. Compared to the >>>>>>>> start index approach, it does not need to worry about the change of >>>>>>>> start >>>>>>>> index when something in the list is added or removed. >>>>>>>> >>>>>>>> However, the issue of concurrent modification could still exist >>>>>>>> even with a continuation token. For example, if an element is added >>>>>>>> before >>>>>>>> the continuation token, then all future listing calls with the token >>>>>>>> would >>>>>>>> always skip that element. If we want to enforce that level of >>>>>>>> atomicity, we >>>>>>>> probably want to introduce another time travel query parameter (e.g. >>>>>>>> asOf=1703003028000) to ensure that we are listing results at a specific >>>>>>>> point of time of the warehouse, so the complete result list is fixed. >>>>>>>> (This >>>>>>>> is also the missing piece I forgot to mention in the start index >>>>>>>> approach >>>>>>>> to ensure it works in distributed settings) >>>>>>>> >>>>>>>> -Jack >>>>>>>> >>>>>>>> On Tue, Dec 19, 2023, 9:51 AM Micah Kornfield < >>>>>>>> emkornfi...@gmail.com> wrote: >>>>>>>> >>>>>>>> I tried to cover these in more details at: >>>>>>>> https://docs.google.com/document/d/1bbfoLssY1szCO_Hm3_93ZcN0UAMpf7kjmpwHQngqQJ0/edit >>>>>>>> >>>>>>>> On Sun, Dec 17, 2023 at 6:07 PM Renjie Liu <liurenjie2...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>> +1 for this approach. I agree that the streaming approach requires >>>>>>>> that http client and servers have http 2 streaming support, which is >>>>>>>> not >>>>>>>> compatible with old clients. >>>>>>>> >>>>>>>> I share the same concern with Micah that only start/limit may not >>>>>>>> be enough in a distributed environment where modification happens >>>>>>>> during >>>>>>>> iterations. For compatibility, we need to consider several cases: >>>>>>>> >>>>>>>> 1. Old client <-> New Server >>>>>>>> 2. New client <-> Old server >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Sat, Dec 16, 2023 at 6:51 AM Daniel Weeks <dwe...@apache.org> >>>>>>>> wrote: >>>>>>>> >>>>>>>> I agree that we want to include this feature and I raised similar >>>>>>>> concerns to what Micah already presented in talking with Ryan. >>>>>>>> >>>>>>>> For backward compatibility, just adding a start and limit implies a >>>>>>>> deterministic order, which is not a current requirement of the REST >>>>>>>> spec. >>>>>>>> >>>>>>>> Also, we need to consider whether the start/limit would need to be >>>>>>>> respected by the server. If existing implementations simply return >>>>>>>> all the >>>>>>>> results, will that be sufficient? There are a few edge cases that >>>>>>>> need to >>>>>>>> be considered here. >>>>>>>> >>>>>>>> For the opaque key approach, I think adding a query param to >>>>>>>> trigger/continue and introducing a continuation token in >>>>>>>> the ListNamespacesResponse might allow for more backward >>>>>>>> compatibility. In >>>>>>>> that scenario, pagination would only take place for clients who know >>>>>>>> how to >>>>>>>> paginate and the ordering would not need to be deterministic. >>>>>>>> >>>>>>>> -Dan >>>>>>>> >>>>>>>> On Fri, Dec 15, 2023, 10:33 AM Micah Kornfield < >>>>>>>> emkornfi...@gmail.com> wrote: >>>>>>>> >>>>>>>> Just to clarify and add a small suggestion: >>>>>>>> >>>>>>>> The behavior with no additional parameters requires the operations >>>>>>>> to happen as they do today for backwards compatibility (i.e either all >>>>>>>> responses are returned or a failure occurs). >>>>>>>> >>>>>>>> For new parameters, I'd suggest an opaque start token (instead of >>>>>>>> specific numeric offset) that can be returned by the service and a >>>>>>>> limit >>>>>>>> (as proposed above). If a start token is provided without a limit a >>>>>>>> default limit can be chosen by the server. Servers might return less >>>>>>>> than >>>>>>>> limit (i.e. clients are required to check for a next token to >>>>>>>> determine if >>>>>>>> iteration is complete). This enables server side state if it is >>>>>>>> desired >>>>>>>> but also makes deterministic listing much more feasible (deterministic >>>>>>>> responses are essentially impossible in the face of changing data if >>>>>>>> only a >>>>>>>> start offset is provided). >>>>>>>> >>>>>>>> In an ideal world, specifying a limit would result in streaming >>>>>>>> responses being returned with the last part either containing a token >>>>>>>> if >>>>>>>> continuation is necessary. Given conversation on the other thread of >>>>>>>> streaming, I'd imagine this is quite hard to model in an Open API REST >>>>>>>> service. >>>>>>>> >>>>>>>> Therefore it seems like using pagination with token and offset >>>>>>>> would be preferred. If skipping someplace in the middle of the >>>>>>>> namespaces >>>>>>>> is required then I would suggest modelling those as first class query >>>>>>>> parameters (e.g. "startAfterNamespace") >>>>>>>> >>>>>>>> Cheers, >>>>>>>> Micah >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Dec 15, 2023 at 10:08 AM Ryan Blue <b...@tabular.io> wrote: >>>>>>>> >>>>>>>> +1 for this approach >>>>>>>> >>>>>>>> I think it's good to use query params because it can be >>>>>>>> backward-compatible with the current behavior. If you get more than the >>>>>>>> limit back, then the service probably doesn't support pagination. And >>>>>>>> if a >>>>>>>> client doesn't support pagination they get the same results that they >>>>>>>> would >>>>>>>> today. A streaming approach with a continuation link like in the scan >>>>>>>> API >>>>>>>> discussion wouldn't work because old clients don't know to make a >>>>>>>> second >>>>>>>> request. >>>>>>>> >>>>>>>> On Thu, Dec 14, 2023 at 10:07 AM Jack Ye <yezhao...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>> Hi everyone, >>>>>>>> >>>>>>>> During the conversation of the Scan API for REST spec, we touched >>>>>>>> on the topic of pagination when REST response is large or takes time >>>>>>>> to be >>>>>>>> produced. >>>>>>>> >>>>>>>> I just want to discuss this separately, since we also see the issue >>>>>>>> for ListNamespaces and ListTables/Views, when integrating with a large >>>>>>>> organization that has over 100k namespaces, and also a lot of tables in >>>>>>>> some namespaces. >>>>>>>> >>>>>>>> Pagination requires either keeping state, or the response to be >>>>>>>> deterministic such that the client can request a range of the full >>>>>>>> response. If we want to avoid keeping state, I think we need to allow >>>>>>>> some >>>>>>>> query parameters like: >>>>>>>> - *start*: the start index of the item in the response >>>>>>>> - *limit*: the number of items to be returned in the response >>>>>>>> >>>>>>>> So we can send a request like: >>>>>>>> >>>>>>>> *GET /namespaces?start=300&limit=100* >>>>>>>> >>>>>>>> *GET /namespaces/ns/tables?start=300&limit=100* >>>>>>>> >>>>>>>> And the REST spec should enforce that the response returned for the >>>>>>>> paginated GET should be deterministic. >>>>>>>> >>>>>>>> Any thoughts on this? >>>>>>>> >>>>>>>> Best, >>>>>>>> Jack Ye >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Ryan Blue >>>>>>>> Tabular >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Ryan Blue >>>>>>>> Tabular >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Ryan Blue >>>>>>>> Tabular >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Ryan Blue >>>>>>>> Tabular >>>>>>>> >>>>>>>> >>>>>>>> Xuanwo >>>>>>>> >>>>>>>>