Re: [Discuss] Spark 3.2 support?

2023-12-13 Thread Eduard Tudenhoefner
+1 on removing Spark 3.2 On Wed, Dec 13, 2023 at 8:01 PM Jean-Baptiste Onofré wrote: > +1 > > Regards > JB > > On Wed, Dec 13, 2023 at 7:10 PM Ajantha Bhat > wrote: > > > > Hi All, > > > > In our recent discussion, we deprecated Spark 3.2 support in Iceberg > 1.4.0 release > > as outlined in th

Re: Proposal for REST APIs for Iceberg table scans

2023-12-13 Thread Renjie Liu
About the pagination part, I did some investigation and found that openapi doesn't have spec about streaming responses, but it's actually implementation detail. There are several ways to implement json streaming , and also there is an rfc

Re: Proposal for REST APIs for Iceberg table scans

2023-12-13 Thread Jack Ye
Seems like that track has expired (This Internet-Draft will expire on 13 May 2022), not sure how these RFCs are managed, but it does not seem hopeful to have this verb in. I think people are mostly using POST for this use case already. But overall I think we are in agreement with the general direc

Re: Proposal for REST APIs for Iceberg table scans

2023-12-13 Thread Ryan Blue
I just changed it to POST after looking into support for the QUERY method. It's a new HTTP method for cases like this where you don't want to pass everything through query params. Here's the QUERY method RFC , but I gues

Re: Proposal for REST APIs for Iceberg table scans

2023-12-13 Thread Jack Ye
Thanks, the Gist explains a lot of things. This is actually very close to our way of implementing the shard ID, we were defining the shard ID as a string, and the string content is actually something similar to the information of the JSON payload you showed, so we can persist minimum information in

Re: Proposal for REST APIs for Iceberg table scans

2023-12-13 Thread Ryan Blue
Jack, It sounds like what I’m proposing isn’t quite clear because your initial response was arguing for a sharding capability. I agree that sharding is a good idea. I’m less confident about two points: 1. Requiring that the service is stateful. As Renjie pointed out, that makes it harder to

Re: Proposal for RESTful Data Operations

2023-12-13 Thread Jack Ye
Thanks Drew for the quick turnaround, I will take a deeper look into the PR. I think if we all agree that it is beneficial to have the AppendFIles(DataFile[]) API (maybe we should call it AppendRows instead), I would like to know if it also makes sense to have: 1. DeleteRows(DeleteFile[]), which c

Re: Proposal for RESTful Data Operations

2023-12-13 Thread Drew
Hi Ryan, Thanks for the feedback, I'll start going through the comments left in the doc! You're right in pointing out that the logic here can be simplified to roll back a commit. For now I introduced a smaller PR, that focuses on the append files operation. Github PR: https://github.com/apache/ic

Re: Proposal for REST APIs for Iceberg table scans

2023-12-13 Thread Jack Ye
After looking around, it seems like compared to OpenAPI, the AsyncAPI protocol (https://www.asyncapi.com/) could be a better option to describe streaming APIs. That might be one potential option, just put it out here. -Jack On Wed, Dec 13, 2023 at 11:52 AM Jack Ye wrote: > The current proposal

Re: Proposal for REST APIs for Iceberg table scans

2023-12-13 Thread Jack Ye
The current proposal definitely makes the server stateful. In our prototype we used other components like DynamoDB to keep track of states. If keeping it stateless is a tenant we can definitely make the proposal closer to that direction. Maybe one thing to make sure is, is this a core tenant of the

Re: [Discuss] Spark 3.2 support?

2023-12-13 Thread Jean-Baptiste Onofré
+1 Regards JB On Wed, Dec 13, 2023 at 7:10 PM Ajantha Bhat wrote: > > Hi All, > > In our recent discussion, we deprecated Spark 3.2 support in Iceberg 1.4.0 > release > as outlined in this thread: > https://lists.apache.org/thread/zw1blng2d1bbrlcftxwmmhb2l7jxbxqx > > As we're gearing up for th

[Discuss] Spark 3.2 support?

2023-12-13 Thread Ajantha Bhat
Hi All, In our recent discussion, we deprecated Spark 3.2 support in Iceberg 1.4.0 release as outlined in this thread: https://lists.apache.org/thread/zw1blng2d1bbrlcftxwmmhb2l7jxbxqx As we're gearing up for the 1.5.0 release, I suggest that we go ahead and completely remove the Spark 3.2 support

Re: [DISCUSS] JUnit5 and parameterized testing

2023-12-13 Thread Eduard Tudenhoefner
I'm also +1 on option 2. If there are no other objections, then I would go ahead and merge https://github.com/apache/iceberg/pull/9161 at the end of the week. That should give people some time to add their feedback to the PR. Thanks Eduard On Thu, Dec 7, 2023 at 6:06 PM Jack Ye wrote: > Looking