I just think this is a bit more complicated than I want to take into the main library just because we have to make decisions about
1. Retries 2. Concurrency 3. Results/Error Reporting But if we have a good proposal for we will handle all those I think we could do it? > On Dec 6, 2023, at 2:05 PM, Andrea Campolonghi <acampolon...@gmail.com> wrote: > > I think that if you call an expire snapshots function this is exactly what > you want > > On Wed, Dec 6, 2023 at 18:47 Ryan Blue <b...@tabular.io > <mailto:b...@tabular.io>> wrote: >> My concern with the per-catalog approach is that people might accidentally >> run it. Do you think it's clear enough that these invocations will drop >> older snapshots? >> >> On Wed, Dec 6, 2023 at 2:40 AM Andrea Campolonghi <acampolon...@gmail.com >> <mailto:acampolon...@gmail.com>> wrote: >>> I like this approach. + 1 >>> >>>> On 6 Dec 2023, at 11:37, naveen <nk1...@gmail.com >>>> <mailto:nk1...@gmail.com>> wrote: >>>> >>>> Hi Everyone, >>>> >>>> Currently Spark-Procedures supports expire_snapshots/remove_orphan_files >>>> per table. >>>> >>>> Today, if someone has to run GCs on an entire catalog they will have to >>>> manually run these procedures for every table. >>>> >>>> Is it a good idea to do it in bulk as per catalog or with multiple tables ? >>>> >>>> Current syntax: >>>> CALL hive_prod.system.expire_snapshots(table => 'db.sample', <Options>) >>>> Proposed Syntax something similar: >>>> >>>> Per Namespace/Database >>>> CALL hive_prod.system.expire_snapshots(database => 'db', <Options>) >>>> Per Catalog >>>> CALL hive_prod.system.expire_snapshots(<Options>) >>>> Multiple Tables >>>> CALL hive_prod.system.expire_snapshots(tables => Array('db1.table1', >>>> 'db2.table2), <Options>) >>>> PS: There could be exceptions for individual catalogs. Like Nessie doesn't >>>> support GC other than Nessie CLI. Hadoop can't list all the Namespaces. >>>> >>>> >>>> Regards, >>>> Naveen Kumar >>>> >>> >> >> >> -- >> Ryan Blue >> Tabular