Re: Pagination for List APIs in the REST spec

2024-05-20 Thread Pucheng Yang
You are right, thanks Jack. On Mon, May 20, 2024 at 8:06 AM Jack Ye wrote: > I believe this is already merged? > https://github.com/apache/iceberg/pull/9782 > > Best, > Jack Ye > > On Sat, May 18, 2024 at 4:06 PM Pucheng Yang > wrote: > >> Hi all, is there an ETA for this? thanks >> >> On Wed,

Re: Pagination for List APIs in the REST spec

2024-05-20 Thread Jack Ye
I believe this is already merged? https://github.com/apache/iceberg/pull/9782 Best, Jack Ye On Sat, May 18, 2024 at 4:06 PM Pucheng Yang wrote: > Hi all, is there an ETA for this? thanks > > On Wed, Dec 20, 2023 at 6:03 PM Renjie Liu > wrote: > >> I think if servers provide a meaningful error

Re: Pagination for List APIs in the REST spec

2024-05-18 Thread Pucheng Yang
Hi all, is there an ETA for this? thanks On Wed, Dec 20, 2023 at 6:03 PM Renjie Liu wrote: > I think if servers provide a meaningful error message on expiration >> hopefully, this would be a good first step in debugging. I think saying >> tokens should generally support O(Minutes) at least shou

Re: Pagination for List APIs in the REST spec

2023-12-20 Thread Renjie Liu
> > I think if servers provide a meaningful error message on expiration > hopefully, this would be a good first step in debugging. I think saying > tokens should generally support O(Minutes) at least should cover most > use-cases? > Sounds reasonable to me. Clients just need to be aware that the

Re: Pagination for List APIs in the REST spec

2023-12-20 Thread Micah Kornfield
> > Overall, I don't think it's a good idea to add parallel listing for things > like tables and namespaces as it just adds complexity for an incredibly > narrow (and possibly poorly designed) use case. +1 I think that there are likely a few ways parallelization of table and namespace listing can

Re: Pagination for List APIs in the REST spec

2023-12-20 Thread Daniel Weeks
Overall, I don't think it's a good idea to add parallel listing for things like tables and namespaces as it just adds complexity for an incredibly narrow (and possibly poorly designed) use case. I feel we should leave it up to the server to define whether it will provide consistency across paginat

Re: Pagination for List APIs in the REST spec

2023-12-20 Thread Micah Kornfield
> > I agree that this is not quite useful for clients at this moment. But I'm > thinking that maybe exposing this will help debugging or diagnosing, user > just need to be aware of this potential expiration. I think if servers provide a meaningful error message on expiration hopefully, this would

Re: Pagination for List APIs in the REST spec

2023-12-19 Thread Renjie Liu
> > If we choose to manage state on the server side, I recommend not revealing > the expiration time to the client, at least not for now. We can introduce > it when there's a practical need. It wouldn't constitute a breaking change, > would it? I agree that this is not quite useful for clients at

Re: Pagination for List APIs in the REST spec

2023-12-19 Thread Xuanwo
> For the continuation token, I think one missing part is about the expiration > time of this token, since this may affect the state cleaning process of the > server. Some storage services use a continuation token as a binary representation of internal states. For example, they serialize a str

Re: Pagination for List APIs in the REST spec

2023-12-19 Thread Renjie Liu
For the continuation token, I think one missing part is about the expiration time of this token, since this may affect the state cleaning process of the server. There are several things to discuss: 1. Should we leave it to the server to decide it or allow the client to config in api? Personally I

Re: Pagination for List APIs in the REST spec

2023-12-19 Thread Micah Kornfield
IMO, parallelization needs to be a first class entity in the end point/service design to allow for flexibility (I scanned through the original proposal for the scan planning and it looked like it was on the right track). Using offsets for parallelization is problematic from both a consistency and

Re: Pagination for List APIs in the REST spec

2023-12-19 Thread Jack Ye
Yes I think the continuation token should in general be opaque. I was trying to give an example of an easy implementation, since there were some general concerns that the features proposed should not be too complicated to implement, to some extent. I also agree the asOf feature can be embedded in

Re: Pagination for List APIs in the REST spec

2023-12-19 Thread Walaa Eldin Moustafa
Not necessarily. That is more of a general statement. The pagination discussion forked from server side scan planning. On Tue, Dec 19, 2023 at 9:52 AM Ryan Blue wrote: > > With start/limit each client can query for own's chunk without > coordination. > > Okay, I understand now. Would you need to

Re: Pagination for List APIs in the REST spec

2023-12-19 Thread Ryan Blue
> With start/limit each client can query for own's chunk without coordination. Okay, I understand now. Would you need to parallelize the client for listing namespaces or tables? That seems odd to me. On Tue, Dec 19, 2023 at 9:48 AM Walaa Eldin Moustafa wrote: > > You can parallelize with opaque

Re: Pagination for List APIs in the REST spec

2023-12-19 Thread Walaa Eldin Moustafa
> You can parallelize with opaque tokens by sending a starting point for the next request. I meant we would have to wait for the server to return this starting point from the past request? With start/limit each client can query for own's chunk without coordination. On Tue, Dec 19, 2023 at 9:44 AM

Re: Pagination for List APIs in the REST spec

2023-12-19 Thread Ryan Blue
> I think start and offset has the advantage of being parallelizable (as compared to continuation tokens). You can parallelize with opaque tokens by sending a starting point for the next request. > On the other hand, using "asOf" can be complex to implement and may be too powerful for the pagina

Re: Pagination for List APIs in the REST spec

2023-12-19 Thread Walaa Eldin Moustafa
Can we assume it is the responsibility of the server to ensure determinism (e.g., by caching the results along with query ID)? I think start and offset has the advantage of being parallelizable (as compared to continuation tokens). On the other hand, using "asOf" can be complex to implement and ma

Re: Pagination for List APIs in the REST spec

2023-12-19 Thread Ryan Blue
I think you can solve the atomicity problem with a continuation token and server-side state. In general, I don't think this is a problem we should worry about a lot since pagination commonly has this problem. But since we can build a system that allows you to solve it if you choose to, we should go

Re: Pagination for List APIs in the REST spec

2023-12-19 Thread Micah Kornfield
Hi Jack, Some answers inline. > In addition to the start index approach, another potential simple way to > implement the continuation token is to use the last item name, when the > listing is guaranteed to be in lexicographic order. I think this is one viable implementation, but the reason that

Re: Pagination for List APIs in the REST spec

2023-12-19 Thread Jack Ye
Yes I agree that it is better to not enforce the implementation to favor any direction, and continuation token is probably better than enforcing a numeric start index. In addition to the start index approach, another potential simple way to implement the continuation token is to use the last item

Re: Pagination for List APIs in the REST spec

2023-12-19 Thread Micah Kornfield
I tried to cover these in more details at: https://docs.google.com/document/d/1bbfoLssY1szCO_Hm3_93ZcN0UAMpf7kjmpwHQngqQJ0/edit On Sun, Dec 17, 2023 at 6:07 PM Renjie Liu wrote: > +1 for this approach. I agree that the streaming approach requires that > http client and servers have http 2 stream

Re: Pagination for List APIs in the REST spec

2023-12-17 Thread Renjie Liu
+1 for this approach. I agree that the streaming approach requires that http client and servers have http 2 streaming support, which is not compatible with old clients. I share the same concern with Micah that only start/limit may not be enough in a distributed environment where modification happe

Re: Pagination for List APIs in the REST spec

2023-12-15 Thread Daniel Weeks
I agree that we want to include this feature and I raised similar concerns to what Micah already presented in talking with Ryan. For backward compatibility, just adding a start and limit implies a deterministic order, which is not a current requirement of the REST spec. Also, we need to consider

Re: Pagination for List APIs in the REST spec

2023-12-15 Thread Micah Kornfield
Just to clarify and add a small suggestion: The behavior with no additional parameters requires the operations to happen as they do today for backwards compatibility (i.e either all responses are returned or a failure occurs). For new parameters, I'd suggest an opaque start token (instead of spec

Re: Pagination for List APIs in the REST spec

2023-12-15 Thread Ryan Blue
+1 for this approach I think it's good to use query params because it can be backward-compatible with the current behavior. If you get more than the limit back, then the service probably doesn't support pagination. And if a client doesn't support pagination they get the same results that they woul

Pagination for List APIs in the REST spec

2023-12-14 Thread Jack Ye
Hi everyone, During the conversation of the Scan API for REST spec, we touched on the topic of pagination when REST response is large or takes time to be produced. I just want to discuss this separately, since we also see the issue for ListNamespaces and ListTables/Views, when integrating with a