Hey Russell,

I agree, Table API already has ExpireSnapshots and RewriteManifests.
In that case, the wrappers add two things on top:

1. Result reporting with actual delete counts across the different
file types. The current table API doesn't return a result object.
2. Consistent API: ActionsProvider would aggregate all available local
actions in one place for consumers like (CLI tools, testing, etc.).

The more interesting actions are the ones without Table API
equivalents: DeleteOrphanFiles, RewriteTablePath, RewriteDataFiles.

I think it would be useful to be able to run all actions without Spark
dependencies. What do you think?

Cheers,
Max


On Wed, Feb 25, 2026 at 8:43 PM Russell Spitzer
<[email protected]> wrote:
>
> So for those first two they already exist in our Table.java API
>
> table.expireSnapshots()
>      .expireOlderThan(tsToExpire)
>      .commit();
>
> table.rewriteManifests()
>      .commit();
>
> Only RewriteTablePath doesn't have a local version yet but I think we could 
> possibly add that
>
> What were you thinking of adding to the existing apis?
>
> On Wed, Feb 25, 2026 at 2:17 AM Maximilian Michels <[email protected]> wrote:
>>
>> Hi Russell,
>>
>> Exactly, for many actions this is mostly plumbing to make the existing
>> functionality available.
>>
>> >Which ones would you like to add implementations for?
>>
>> We can start with some simple ones, e.g. ExpireSnapshots,
>> RewriteManifests, RewriteTablePath.
>>
>> -Max
>>
>>
>> On Tue, Feb 24, 2026 at 5:03 PM Russell Spitzer
>> <[email protected]> wrote:
>> >
>> > We already do have non-distributed versions for a bunch of the 
>> > functionality in core (that's what the actions were based on) so I don't 
>> > think this is a wild idea. Which ones would you like to add 
>> > implementations for?
>> >
>> > On Tue, Feb 24, 2026 at 9:23 AM Maximilian Michels <[email protected]> wrote:
>> >>
>> >> Hi everyone,
>> >>
>> >> I've been looking at the Iceberg Actions [1] and noticed many of them 
>> >> don't fundamentally require a distributed engine.
>> >>
>> >> Apart from RewriteDataFiles, most of the maintenance tasks are rather 
>> >> lightweight in the processing department. Some of them could probably run 
>> >> faster and with fewer resources locally, backed by a thread pool.
>> >>
>> >> I wonder whether Iceberg could benefit from a local implementation for 
>> >> ActionsProvider [2]. We have a lot of the building blocks for these 
>> >> already available in the core.
>> >>
>> >> Granted, there are scalability limitations for large tables. Also, it's 
>> >> often more convenient to use existing (distributed) compute 
>> >> infrastructure. Yet, there are use cases where distributed computing 
>> >> isn't strictly required. For example:
>> >>
>> >>   - CLI tooling
>> >>   - CI/CD pipelines and automation scripts
>> >>   - REST catalog backends which want to run maintenance internally
>> >>   - Small tables in general
>> >>   - Environments where Flink/Spark are not available
>> >>
>> >> I'm curious to hear your thoughts.
>> >>
>> >> Cheers,
>> >> Max
>> >>
>> >> [1] 
>> >> https://github.com/apache/iceberg/tree/501824f0c0032b3225b0fe52b904756f0fe5c589/api/src/main/java/org/apache/iceberg/actions
>> >> [2] 
>> >> https://github.com/apache/iceberg/blob/501824f0c0032b3225b0fe52b904756f0fe5c589/api/src/main/java/org/apache/iceberg/actions/ActionsProvider.java#L24

Reply via email to