Re: cleanExpiredMetadata in RemoveSnapshots

2025-05-12 Thread Pucheng Yang
Thanks all for the discussion. I also agree that we should make this behavior turned off by default. And I would also love to see this flag be added to the Spark/ Flink procedure. I think having this feature available on the client side seems more achievable in the short run and designing a server

Re: cleanExpiredMetadata in RemoveSnapshots

2025-03-26 Thread Gabor Kaszab
Thanks for the responses! My concern is the same, Manu, Peter: many stakeholders in this community don't have a catalog that is capable of executing table maintenance (e.g. HiveCatalog) and rely on the Spark procedures and actions for this purpose. I feel that we should still give them the new fun

Re: cleanExpiredMetadata in RemoveSnapshots

2025-03-26 Thread Péter Váry
I know of several companies who are using either scheduled stored procedures or the existing actions to maintain production tables. I don't think we should deprecate them until there is a viable open solution for them. Manu Zhang ezt írta (időpont: 2025. márc. 19., Sze, 17:52): > I think a catal

Re: cleanExpiredMetadata in RemoveSnapshots

2025-03-19 Thread Manu Zhang
I think a catalog service can also use Spark/Flink procedures for table maintenance, to utilize existing systems and cluster resources. If we no longer support new functionality in Spark/Flink procedures, we are effectively deprecating them, right? Gabor Kaszab 于2025年3月20日 周四00:07写道: > Thanks fo

Re: cleanExpiredMetadata in RemoveSnapshots

2025-03-19 Thread Gabor Kaszab
Thanks for the responses so far! Sure, keeping the default as false makes sense because this is a new feature, so let's be on the safe side. About exposing setting the flag in the Spark action/procedure and also via Flink: I believe currently there are a number of vendors that don't have a catalo

Re: cleanExpiredMetadata in RemoveSnapshots

2025-03-15 Thread Ryan Blue
I don't think it is necessary to either make cleanup the default or to expose the flag in Spark or other engines. Right now, catalogs are taking on a lot more responsibility for things like snapshot expiration, orphan file cleanup, and schema or partition spec removal. Ideally, those are tasks tha

Re: cleanExpiredMetadata in RemoveSnapshots

2025-03-15 Thread Péter Váry
I would be hesitant to turn on any new feature by default. Especially for Spark compaction which is widely used in production. +1 for providing a way for the users to enable the feature manually Gabor Kaszab ezt írta (időpont: 2025. márc. 14., P, 12:19): > Hi Iceberg Community, > > There were r

Re: cleanExpiredMetadata in RemoveSnapshots

2025-03-14 Thread Jean-Baptiste Onofré
Hi Gabor I think the question is "when". As it's a behavior change, I don't think we should do that on a "minor" release, else users would be "surprised". I would propose to keep the current behavior on Iceberg Java 1.x and change the flag to true on Iceberg Java 2.x (after a vote). Regards JB O