I think that if you call an expire snapshots function this is exactly what you want
On Wed, Dec 6, 2023 at 18:47 Ryan Blue <b...@tabular.io> wrote: > My concern with the per-catalog approach is that people might accidentally > run it. Do you think it's clear enough that these invocations will drop > older snapshots? > > On Wed, Dec 6, 2023 at 2:40 AM Andrea Campolonghi <acampolon...@gmail.com> > wrote: > >> I like this approach. + 1 >> >> On 6 Dec 2023, at 11:37, naveen <nk1...@gmail.com> wrote: >> >> Hi Everyone, >> >> Currently Spark-Procedures supports *expire_snapshots/remove_orphan_files >> *per table. >> >> Today, if someone has to run GCs on an entire catalog they will have to >> manually run these procedures for every table. >> >> Is it a good idea to do it in bulk as per catalog or with multiple tables >> ? >> >> Current syntax: >> >> CALL hive_prod.system.expire_snapshots(table => 'db.sample', <Options>) >> >> Proposed Syntax something similar: >> >> Per Namespace/Database >> >> CALL hive_prod.system.expire_snapshots(database => 'db', <Options>) >> >> Per Catalog >> >> CALL hive_prod.system.expire_snapshots(<Options>) >> >> Multiple Tables >> >> CALL hive_prod.system.expire_snapshots(tables => Array('db1.table1', >> 'db2.table2), <Options>) >> >> PS: There could be exceptions for individual catalogs. Like Nessie >> doesn't support GC other than Nessie CLI. Hadoop can't list all the >> Namespaces. >> >> >> Regards, >> Naveen Kumar >> >> >> > > -- > Ryan Blue > Tabular >