My concern with the per-catalog approach is that people might accidentally run it. Do you think it's clear enough that these invocations will drop older snapshots?
On Wed, Dec 6, 2023 at 2:40 AM Andrea Campolonghi <acampolon...@gmail.com> wrote: > I like this approach. + 1 > > On 6 Dec 2023, at 11:37, naveen <nk1...@gmail.com> wrote: > > Hi Everyone, > > Currently Spark-Procedures supports *expire_snapshots/remove_orphan_files > *per table. > > Today, if someone has to run GCs on an entire catalog they will have to > manually run these procedures for every table. > > Is it a good idea to do it in bulk as per catalog or with multiple tables ? > > Current syntax: > > CALL hive_prod.system.expire_snapshots(table => 'db.sample', <Options>) > > Proposed Syntax something similar: > > Per Namespace/Database > > CALL hive_prod.system.expire_snapshots(database => 'db', <Options>) > > Per Catalog > > CALL hive_prod.system.expire_snapshots(<Options>) > > Multiple Tables > > CALL hive_prod.system.expire_snapshots(tables => Array('db1.table1', > 'db2.table2), <Options>) > > PS: There could be exceptions for individual catalogs. Like Nessie doesn't > support GC other than Nessie CLI. Hadoop can't list all the Namespaces. > > > Regards, > Naveen Kumar > > > -- Ryan Blue Tabular