Re: [DISCUSS] Enable the discussion tab for iceberg github repos

2024-07-08 Thread Renjie Liu
Hi: It's also possible to create a user mailing list if it helps. I'm neutral to this option. Seems we are actually missing the user mail list. On Tue, Jul 9, 2024 at 1:50 PM Xuanwo wrote: > Hi, > > > Regarding the discussion tab, it sounds good to me. It's pretty > straight forward to do by

Re: Spark: Copy Table Action

2024-07-08 Thread Péter Váry
I think in most cases the copy table action doesn't require a query engine to read and generate the new metadata files. This means, that it would be nice to provide a pure Java implementation in the core, and it could be extended/reused by different engines, like Spark, to execute it in a distribut

Re: [DISCUSS] Enable the discussion tab for iceberg github repos

2024-07-08 Thread Xuanwo
Hi, > Regarding the discussion tab, it sounds good to me. It's pretty straight forward to do by editing .asf.yaml. I tried this before. But the asf.yaml doesn't support controling discussion yet. We need the help from infra team. https://cwiki.apache.org/confluence/pages/viewpage.action?spaceKey

Re: [DISCUSS] Enable the discussion tab for iceberg github repos

2024-07-08 Thread Jean-Baptiste Onofré
Hi It's also possible to create a user mailing list if it helps. Regarding the discussion tab, it sounds good to me. It's pretty straight forward to do by editing .asf.yaml. Regards JB On Tue, Jul 9, 2024 at 5:18 AM Renjie Liu wrote: > > Hi: > > Recently we have observed more and more user int

Re: [DISCUSS] Extend Snapshot Metadata Lifecycle

2024-07-08 Thread Péter Váry
We need to handle expired snapshots in several places differently in Iceberg core as well. - We need to add checks to prevent scans read these snapshots and throw a meaningful error. - We need to add checks to prevent tagging/branching these snapshots - We need to update DeleteOrphanFiles in Spark/

Re: [DISCUSS] Enable the discussion tab for iceberg github repos

2024-07-08 Thread Renjie Liu
I think dev list is mostly about developer discussion, such as design, proposal. Github issues are mostly about issue/bug tracking. Github discussion is about user questions, announcements, etc. But I agree that we could start with iceberg-rust repo first, as it's relatively new and we have more

Re: [DISCUSS] Enable the discussion tab for iceberg github repos

2024-07-08 Thread Xuanwo
Thank Renjie to start this thread. I considered suggesting enabling GitHub discussions for iceberg-rust, but I hesitated because Iceberg itself doesn't use this feature and almost all discussions take place on the mailing list. Although I enjoy using GitHub discussions, I'm also satisfied with

[DISCUSS] Enable the discussion tab for iceberg github repos

2024-07-08 Thread Renjie Liu
Hi: Recently we have observed more and more user interested in iceberg-rust, and they have many questions about it, for example the status, relationship with others such pyiceberg. Slack is a great place to discussion, but is not friendly for long discussion and not easy to comment. We can also en

Re: [Vote] Deprecate oauth tokens endpoint

2024-07-08 Thread Jack Ye
+1 (binding) There are some wording aspects that others still have comments in the PR, but in general, +1 for deprecating the endpoint. Best, Jack Ye On Mon, Jul 8, 2024 at 9:22 AM Robert Stupp wrote: > +1 > > On 08.07.24 18:15, Robert Stupp wrote: > > Hi Everyone, > > > > I propose that we me

Re: Spark: Copy Table Action

2024-07-08 Thread Anurag Mantripragada
Hi Yufei. Thanks for the proposal. While the actions are great, they still need to do a lot of work which can be reduced if we have the relative path changes. I still support adding these actions as moving data was out of scope for the relative path design and we can use these actions as helpe

Re: [DISCUSS] Extend Snapshot Metadata Lifecycle

2024-07-08 Thread Szehon Ho
Thanks for the comments so far. I also thought previously that this functionality would be in an external system, like LakeChime, or a custom catalog extension. But after doing an initial analysis (please double check), I thought it's a small enough change that it would be worth putting in the Ic

Re: Spark: Copy Table Action

2024-07-08 Thread Pucheng Yang
Thanks for picking this up, I think this is a very valuable addition. On Mon, Jul 8, 2024 at 10:48 AM Yufei Gu wrote: > Hi folks, > > I'd like to share a recent progress of adding actions to copy tables > across different places. > > There is a constant need to copy tables across different place

Re: [DISCUSS] Extend Snapshot Metadata Lifecycle

2024-07-08 Thread John Greene
I do agree with the need that this proposal solves, to decouple the snapshot history from the data deletion. I do wonder, will keeping expired snapshots as is slow down manifest/scan planning though (REST catalog approaches could probably mitigate this)? On Mon, Jul 8, 2024, 5:34 AM Piotr Findeise

Spark: Copy Table Action

2024-07-08 Thread Yufei Gu
Hi folks, I'd like to share a recent progress of adding actions to copy tables across different places. There is a constant need to copy tables across different places for purposes such as disaster recovery and testing. Due to the absolute file paths in Iceberg metadata, it doesn't work automatic

Re: [Vote] Deprecate oauth tokens endpoint

2024-07-08 Thread Robert Stupp
+1 On 08.07.24 18:15, Robert Stupp wrote: Hi Everyone, I propose that we merge PR to "Deprecate oauth/tokens endpoint". The background and overall plan is discussed on this mailing list [2] and this google doc [3]. Please vote in the next 72 hours. Robert [1] https://github.com/apache/i

[Vote] Deprecate oauth tokens endpoint

2024-07-08 Thread Robert Stupp
Hi Everyone, I propose that we merge PR to "Deprecate oauth/tokens endpoint". The background and overall plan is discussed on this mailing list [2] and this google doc [3]. Please vote in the next 72 hours. Robert [1] https://github.com/apache/iceberg/pull/10603 [2] https://lists.apache.

Re: [DISCUSS] Extend Snapshot Metadata Lifecycle

2024-07-08 Thread Piotr Findeisen
Hi Shehon, Walaa Thank Shehon for bringing this up. And thank you Walaa for proving more context from similar existing solution to the problem. The choices that LakeChime seems to have made -- to keep information in a separate RDBMS and which particular metadata information to retain -- they indee