Hi Robert,

Thanks for sharing the proposal and the PR. Before diving deeper into the
API shape, I was hoping to better understand the intended use cases you
have in mind:

1. What concrete scenarios are you primarily targeting with these
long-running object store operations?
2. Are these mostly expected to be file/object-level maintenance tasks
(e.g. purge, cleanup), or do you envision broader categories of operations
leveraging the same abstraction?

Having a clearer picture of the motivating use cases would help evaluate
the right level of abstraction and where this should live architecturally.

Looking forward to the discussion.

Yufei


On Fri, Dec 12, 2025 at 3:48 AM Robert Stupp <[email protected]> wrote:

> Hi all,
>
> I'd like to propose an API and corresponding implementation for (long
> running) object store operations.
>
> It provides a CPU and heap-friendly API and implementation to work
> against object stores. It is built in a way to provide "pluggable"
> functionality. What I mean is this (Java pseudo code):
> ---
> FileOperations fileOps =
> fileOperationsFactory.createFileOperations(fileIoInstance);
> Stream<FileSpec> allIcebergTableFiles = fileOps.
>     identifyIcebergTableFiles(metadataLocation);
> PurgeStats purged = fileOps.purge(allIcebergTableFiles);
> // or simpler:
> PurgeStats purged = fileOps.purgeIcebergTable(metadataLocation);
> // or similarly for Iceberg views
> PurgeStats purged = fileOps.purgeIcebergView(metadataLocation);
> // or to purge all files underneath a prefix
> PurgeStats purged = fileOps.purge(fileOps.findFiles(prefix));
> ---
>
> Not mentioned in the pseudo code is the ability to rate-limit the
> number of purged files or batch-deletions and configure the deletion
> batch-size.
>
> The PR already contains tests against an on-heap object store mock and
> integration tests against S3/GCS/Azure emulators.
>
> More details can be found in the README [2] included in the PR and of
> course in the code in the PR.
>
> Robert
>
> [1] https://github.com/apache/polaris/pull/3256
> [2]
> https://github.com/snazy/polaris/blob/obj-store-ops/storage/files/README.md
>

Reply via email to