I think that if you call an expire snapshots function this is exactly what
you want

On Wed, Dec 6, 2023 at 18:47 Ryan Blue <b...@tabular.io> wrote:

> My concern with the per-catalog approach is that people might accidentally
> run it. Do you think it's clear enough that these invocations will drop
> older snapshots?
>
> On Wed, Dec 6, 2023 at 2:40 AM Andrea Campolonghi <acampolon...@gmail.com>
> wrote:
>
>> I like this approach. + 1
>>
>> On 6 Dec 2023, at 11:37, naveen <nk1...@gmail.com> wrote:
>>
>> Hi Everyone,
>>
>> Currently Spark-Procedures supports *expire_snapshots/remove_orphan_files
>> *per table.
>>
>> Today, if someone has to run GCs on an entire catalog they will have to
>> manually run these procedures for every table.
>>
>> Is it a good idea to do it in bulk as per catalog or with multiple tables
>> ?
>>
>> Current syntax:
>>
>> CALL hive_prod.system.expire_snapshots(table => 'db.sample', <Options>)
>>
>> Proposed Syntax something similar:
>>
>> Per Namespace/Database
>>
>> CALL hive_prod.system.expire_snapshots(database => 'db', <Options>)
>>
>> Per Catalog
>>
>> CALL hive_prod.system.expire_snapshots(<Options>)
>>
>> Multiple Tables
>>
>> CALL hive_prod.system.expire_snapshots(tables => Array('db1.table1', 
>> 'db2.table2), <Options>)
>>
>> PS: There could be exceptions for individual catalogs. Like Nessie
>> doesn't support GC other than Nessie CLI. Hadoop can't list all the
>> Namespaces.
>>
>>
>> Regards,
>> Naveen Kumar
>>
>>
>>
>
> --
> Ryan Blue
> Tabular
>

Reply via email to