I like this approach. + 1
> On 6 Dec 2023, at 11:37, naveen <nk1...@gmail.com> wrote:
>
> Hi Everyone,
>
> Currently Spark-Procedures supports expire_snapshots/remove_orphan_files per
> table.
>
> Today, if someone has to run GCs on an entire catalog they will have to
> manually run these procedures for every table.
>
> Is it a good idea to do it in bulk as per catalog or with multiple tables ?
>
> Current syntax:
> CALL hive_prod.system.expire_snapshots(table => 'db.sample', <Options>)
> Proposed Syntax something similar:
>
> Per Namespace/Database
> CALL hive_prod.system.expire_snapshots(database => 'db', <Options>)
> Per Catalog
> CALL hive_prod.system.expire_snapshots(<Options>)
> Multiple Tables
> CALL hive_prod.system.expire_snapshots(tables => Array('db1.table1',
> 'db2.table2), <Options>)
> PS: There could be exceptions for individual catalogs. Like Nessie doesn't
> support GC other than Nessie CLI. Hadoop can't list all the Namespaces.
>
>
> Regards,
> Naveen Kumar
>