Re: cleanExpiredMetadata in RemoveSnapshots

2025-07-11 Thread Gábor Kaszab
Just an FYI, here is the PR for the Spark procedure. Only for 4.0 as of now, will backport to other Spark versions once this is finalized. Thanks again! Gabor Gábor Kaszab ezt írta (időpont: 2025. júl. 8., K, 15:57): > Thank you all for taking a lo

Re: cleanExpiredMetadata in RemoveSnapshots

2025-07-08 Thread Gábor Kaszab
Thank you all for taking a look and sharing your opinions! It seems we have consensus to extend the Spark procedure with a parameter to control this functionality. Let me prepare a PR for this and get back to you. Also I'll take a look at Flink usage too. Regards, Gabor Kaszab Jean-Baptiste Onof

Re: cleanExpiredMetadata in RemoveSnapshots

2025-07-08 Thread Jean-Baptiste Onofré
Hi I think it makes sense to have a procedure in spark for that. My point was about the catalog long term solution. So short term, +1 for a spark procedure. Long term, we should not forget the catalog (especially for engine interoperability). Thanks! Regards JB Le lun. 7 juil. 2025 à 09:31, Gá

Re: cleanExpiredMetadata in RemoveSnapshots

2025-07-07 Thread Ryan Blue
I think it's reasonable to expose the options through the stored procedure. I just don't think that we want to change to make it the default behavior. On Mon, Jul 7, 2025 at 8:37 AM Manu Zhang wrote: > I’m not seeing how Spark procedure contradicts to the catalog solution. > Catalogs can make de

Re: cleanExpiredMetadata in RemoveSnapshots

2025-07-07 Thread Manu Zhang
I’m not seeing how Spark procedure contradicts to the catalog solution. Catalogs can make decisions based on policies and pass down parameters to spark procedures to execute. In addition, it can be used by all catalogs and table maintenance systems. Regards, Manu Gábor Kaszab 于2025年7月7日 周一21:31写道

Re: cleanExpiredMetadata in RemoveSnapshots

2025-07-07 Thread Gábor Kaszab
Thanks for the response, JB! This could be a responsibility of the catalog and in turn a TMS, I agree. However, that seems more a mig/long-term solution, while the Spark expire_snapshots procedure is already there, the Java core implementation to clean expired specs and schemas is already there wi

Re: cleanExpiredMetadata in RemoveSnapshots

2025-07-03 Thread Jean-Baptiste Onofré
Hi Gabor I would consider cleanExpiredMetadata as a table maintenance procedure. So, I agree that it should be managed by a catalog (as part of catalog policies and TMS). I'm not against to switch the cleanExpiredMetadata flag to true, and let the query engine and the catalog deal with that. Rega

Re: cleanExpiredMetadata in RemoveSnapshots

2025-07-02 Thread Gábor Kaszab
Hi Iceberg Community, It's been a while since the last activity on this thread but let me bump this conversation because there were people showing some interest in giving a way of switching `cleanExpiredMetadata` through procedures (Manu, Peter, Pucheng). I understand the long term goal is to dele

Re: cleanExpiredMetadata in RemoveSnapshots

2025-05-12 Thread Pucheng Yang
Thanks all for the discussion. I also agree that we should make this behavior turned off by default. And I would also love to see this flag be added to the Spark/ Flink procedure. I think having this feature available on the client side seems more achievable in the short run and designing a server

Re: cleanExpiredMetadata in RemoveSnapshots

2025-03-26 Thread Gabor Kaszab
Thanks for the responses! My concern is the same, Manu, Peter: many stakeholders in this community don't have a catalog that is capable of executing table maintenance (e.g. HiveCatalog) and rely on the Spark procedures and actions for this purpose. I feel that we should still give them the new fun

Re: cleanExpiredMetadata in RemoveSnapshots

2025-03-26 Thread Péter Váry
I know of several companies who are using either scheduled stored procedures or the existing actions to maintain production tables. I don't think we should deprecate them until there is a viable open solution for them. Manu Zhang ezt írta (időpont: 2025. márc. 19., Sze, 17:52): > I think a catal

Re: cleanExpiredMetadata in RemoveSnapshots

2025-03-19 Thread Manu Zhang
I think a catalog service can also use Spark/Flink procedures for table maintenance, to utilize existing systems and cluster resources. If we no longer support new functionality in Spark/Flink procedures, we are effectively deprecating them, right? Gabor Kaszab 于2025年3月20日 周四00:07写道: > Thanks fo

Re: cleanExpiredMetadata in RemoveSnapshots

2025-03-19 Thread Gabor Kaszab
Thanks for the responses so far! Sure, keeping the default as false makes sense because this is a new feature, so let's be on the safe side. About exposing setting the flag in the Spark action/procedure and also via Flink: I believe currently there are a number of vendors that don't have a catalo

Re: cleanExpiredMetadata in RemoveSnapshots

2025-03-15 Thread Ryan Blue
I don't think it is necessary to either make cleanup the default or to expose the flag in Spark or other engines. Right now, catalogs are taking on a lot more responsibility for things like snapshot expiration, orphan file cleanup, and schema or partition spec removal. Ideally, those are tasks tha

Re: cleanExpiredMetadata in RemoveSnapshots

2025-03-15 Thread Péter Váry
I would be hesitant to turn on any new feature by default. Especially for Spark compaction which is widely used in production. +1 for providing a way for the users to enable the feature manually Gabor Kaszab ezt írta (időpont: 2025. márc. 14., P, 12:19): > Hi Iceberg Community, > > There were r

cleanExpiredMetadata in RemoveSnapshots

2025-03-14 Thread Gabor Kaszab
Hi Iceberg Community, There were recent additions to RemoveSnapshots to expire the unused partition specs and schemas. This is controlled by a flag called 'cleanExpiredMetadata' and has a default value 'false'. Additionally, Spark

Re: cleanExpiredMetadata in RemoveSnapshots

2025-03-14 Thread Jean-Baptiste Onofré
Hi Gabor I think the question is "when". As it's a behavior change, I don't think we should do that on a "minor" release, else users would be "surprised". I would propose to keep the current behavior on Iceberg Java 1.x and change the flag to true on Iceberg Java 2.x (after a vote). Regards JB O