Also iceberg catalog supports nested namespace, so maybe we need to consider more general syntax for only database, table levels.
On Thu, Dec 7, 2023 at 5:17 AM Russell Spitzer <russell.spit...@gmail.com> wrote: > I just think this is a bit more complicated than I want to take into the > main library just because we have to make decisions about > > 1. Retries > 2. Concurrency > 3. Results/Error Reporting > > But if we have a good proposal for we will handle all those I think we > could do it? > > On Dec 6, 2023, at 2:05 PM, Andrea Campolonghi <acampolon...@gmail.com> > wrote: > > I think that if you call an expire snapshots function this is exactly what > you want > > On Wed, Dec 6, 2023 at 18:47 Ryan Blue <b...@tabular.io> wrote: > >> My concern with the per-catalog approach is that people might >> accidentally run it. Do you think it's clear enough that these invocations >> will drop older snapshots? >> >> On Wed, Dec 6, 2023 at 2:40 AM Andrea Campolonghi <acampolon...@gmail.com> >> wrote: >> >>> I like this approach. + 1 >>> >>> On 6 Dec 2023, at 11:37, naveen <nk1...@gmail.com> wrote: >>> >>> Hi Everyone, >>> >>> Currently Spark-Procedures supports *expire_snapshots/remove_orphan_files >>> *per table. >>> >>> Today, if someone has to run GCs on an entire catalog they will have to >>> manually run these procedures for every table. >>> >>> Is it a good idea to do it in bulk as per catalog or with multiple >>> tables ? >>> >>> Current syntax: >>> >>> CALL hive_prod.system.expire_snapshots(table => 'db.sample', <Options>) >>> >>> Proposed Syntax something similar: >>> >>> Per Namespace/Database >>> >>> CALL hive_prod.system.expire_snapshots(database => 'db', <Options>) >>> >>> Per Catalog >>> >>> CALL hive_prod.system.expire_snapshots(<Options>) >>> >>> Multiple Tables >>> >>> CALL hive_prod.system.expire_snapshots(tables => Array('db1.table1', >>> 'db2.table2), <Options>) >>> >>> PS: There could be exceptions for individual catalogs. Like Nessie >>> doesn't support GC other than Nessie CLI. Hadoop can't list all the >>> Namespaces. >>> >>> >>> Regards, >>> Naveen Kumar >>> >>> >>> >> >> -- >> Ryan Blue >> Tabular >> > >