You are right, thanks Jack.
On Mon, May 20, 2024 at 8:06 AM Jack Ye wrote:
> I believe this is already merged?
> https://github.com/apache/iceberg/pull/9782
>
> Best,
> Jack Ye
>
> On Sat, May 18, 2024 at 4:06 PM Pucheng Yang
> wrote:
>
>> Hi all, is there an ETA for this? thanks
>>
>> On Wed,
I believe this is already merged?
https://github.com/apache/iceberg/pull/9782
Best,
Jack Ye
On Sat, May 18, 2024 at 4:06 PM Pucheng Yang
wrote:
> Hi all, is there an ETA for this? thanks
>
> On Wed, Dec 20, 2023 at 6:03 PM Renjie Liu
> wrote:
>
>> I think if servers provide a meaningful error
Hi all, is there an ETA for this? thanks
On Wed, Dec 20, 2023 at 6:03 PM Renjie Liu wrote:
> I think if servers provide a meaningful error message on expiration
>> hopefully, this would be a good first step in debugging. I think saying
>> tokens should generally support O(Minutes) at least shou
>
> I think if servers provide a meaningful error message on expiration
> hopefully, this would be a good first step in debugging. I think saying
> tokens should generally support O(Minutes) at least should cover most
> use-cases?
>
Sounds reasonable to me. Clients just need to be aware that the
>
> Overall, I don't think it's a good idea to add parallel listing for things
> like tables and namespaces as it just adds complexity for an incredibly
> narrow (and possibly poorly designed) use case.
+1 I think that there are likely a few ways parallelization of table and
namespace listing can
Overall, I don't think it's a good idea to add parallel listing for things
like tables and namespaces as it just adds complexity for an incredibly
narrow (and possibly poorly designed) use case.
I feel we should leave it up to the server to define whether it will
provide consistency across paginat
>
> I agree that this is not quite useful for clients at this moment. But I'm
> thinking that maybe exposing this will help debugging or diagnosing, user
> just need to be aware of this potential expiration.
I think if servers provide a meaningful error message on expiration
hopefully, this would
>
> If we choose to manage state on the server side, I recommend not revealing
> the expiration time to the client, at least not for now. We can introduce
> it when there's a practical need. It wouldn't constitute a breaking change,
> would it?
I agree that this is not quite useful for clients at
> For the continuation token, I think one missing part is about the expiration
> time of this token, since this may affect the state cleaning process of the
> server.
Some storage services use a continuation token as a binary representation of
internal states. For example, they serialize a str
For the continuation token, I think one missing part is about the
expiration time of this token, since this may affect the state
cleaning process of the server. There are several things to discuss:
1. Should we leave it to the server to decide it or allow the client to
config in api?
Personally I
IMO, parallelization needs to be a first class entity in the end
point/service design to allow for flexibility (I scanned through the
original proposal for the scan planning and it looked like it was on the
right track). Using offsets for parallelization is problematic from both a
consistency and
Yes I think the continuation token should in general be opaque. I was
trying to give an example of an easy implementation, since there were some
general concerns that the features proposed should not be too complicated
to implement, to some extent.
I also agree the asOf feature can be embedded in
Not necessarily. That is more of a general statement. The pagination
discussion forked from server side scan planning.
On Tue, Dec 19, 2023 at 9:52 AM Ryan Blue wrote:
> > With start/limit each client can query for own's chunk without
> coordination.
>
> Okay, I understand now. Would you need to
> With start/limit each client can query for own's chunk without
coordination.
Okay, I understand now. Would you need to parallelize the client for
listing namespaces or tables? That seems odd to me.
On Tue, Dec 19, 2023 at 9:48 AM Walaa Eldin Moustafa
wrote:
> > You can parallelize with opaque
> You can parallelize with opaque tokens by sending a starting point for
the next request.
I meant we would have to wait for the server to return this starting point
from the past request? With start/limit each client can query for own's
chunk without coordination.
On Tue, Dec 19, 2023 at 9:44 AM
> I think start and offset has the advantage of being parallelizable (as
compared to continuation tokens).
You can parallelize with opaque tokens by sending a starting point for the
next request.
> On the other hand, using "asOf" can be complex to implement and may be
too powerful for the pagina
Can we assume it is the responsibility of the server to ensure determinism
(e.g., by caching the results along with query ID)? I think start and
offset has the advantage of being parallelizable (as compared to
continuation tokens). On the other hand, using "asOf" can be complex to
implement and ma
I think you can solve the atomicity problem with a continuation token and
server-side state. In general, I don't think this is a problem we should
worry about a lot since pagination commonly has this problem. But since we
can build a system that allows you to solve it if you choose to, we should
go
Hi Jack,
Some answers inline.
> In addition to the start index approach, another potential simple way to
> implement the continuation token is to use the last item name, when the
> listing is guaranteed to be in lexicographic order.
I think this is one viable implementation, but the reason that
Yes I agree that it is better to not enforce the implementation to favor
any direction, and continuation token is probably better than enforcing a
numeric start index.
In addition to the start index approach, another potential simple way to
implement the continuation token is to use the last item
I tried to cover these in more details at:
https://docs.google.com/document/d/1bbfoLssY1szCO_Hm3_93ZcN0UAMpf7kjmpwHQngqQJ0/edit
On Sun, Dec 17, 2023 at 6:07 PM Renjie Liu wrote:
> +1 for this approach. I agree that the streaming approach requires that
> http client and servers have http 2 stream
+1 for this approach. I agree that the streaming approach requires that
http client and servers have http 2 streaming support, which is not
compatible with old clients.
I share the same concern with Micah that only start/limit may not be enough
in a distributed environment where modification happe
I agree that we want to include this feature and I raised similar concerns
to what Micah already presented in talking with Ryan.
For backward compatibility, just adding a start and limit implies a
deterministic order, which is not a current requirement of the REST spec.
Also, we need to consider
Just to clarify and add a small suggestion:
The behavior with no additional parameters requires the operations to
happen as they do today for backwards compatibility (i.e either all
responses are returned or a failure occurs).
For new parameters, I'd suggest an opaque start token (instead of spec
+1 for this approach
I think it's good to use query params because it can be backward-compatible
with the current behavior. If you get more than the limit back, then the
service probably doesn't support pagination. And if a client doesn't
support pagination they get the same results that they woul
Hi everyone,
During the conversation of the Scan API for REST spec, we touched on the
topic of pagination when REST response is large or takes time to be
produced.
I just want to discuss this separately, since we also see the issue for
ListNamespaces and ListTables/Views, when integrating with a
26 matches
Mail list logo