Re: [PROPOSAL] Object store functionality

Adam Christian Fri, 09 Jan 2026 07:03:30 -0800

So, I wanted to bring all discussions into this thread as we are jumping
between the issue filed, this thread, and the GitHub PR.


In regards to Yufei's comment on the PR [1], sorry if I wasn't clear when
Yufei, Prashant, & I had our conversation. From what I remember, there were
two points brought up:
1. I wanted to check out Yufei's comment on the memory consumption
analysis. I did NOT recommend pausing on the work. I did want to do an
investigation on my side on Yufei's comment on the GitHub issue.
2. I did want to see what other options there were for implementation, but
those can be done as a follow up. Prashant had mentioned leveraging
AllEntriesTable [2].

Let me address these one-by-one.

On the memory pressure analysis:
1. I agree with Yufei that the encoding is just on the ManifestFile
metadata object; not the Manifest file content. That alone should not
explain significant memory pressure.
2. However, this is NOT what I read the ticket to be about since the ticket
is titled "Purge table task implementation prone to OOMs" and, as Pierre
pointed out, there is the line "Although some Java stream handling is being
used, all manifest-files of the table to purge are materialized at once on
the Java heap."
3. I agree with Pierre's analysis of the memory consumption patterns of
TableCleanupTaskHandler and I believe that our current implementation can
be improved.
4. I believe that Robert's PR improves our current implementation and,
because it is an incremental way to get us to a more stable product, we
should get it in.
5. I believe that improvements to the current codebase should not be
hindered by future concerns like the asynchronous task or delegation
service work as long as it does not block those future decisions. This PR
is orthogonal to those decisions. This PR helps improve existing
functionality and can be used however we decide to move forward in the
future.

On the other options for implementation:
1. I agree with Prashant that it would be great to reuse existing
functionality when it applies to our use cases. I did not know about the
AllEntriesTable functionality in Iceberg before and it could be fruitful
for someone to do that analysis in the future.
2. That being said, I do NOT believe that we need to wait for someone to do
that analysis before moving forward. The beauty of having the separation
between the File Operations APIs and the implementation modules is that we
can change the implementation when we find a better one.

So, personally, I would like to get this work in and start alleviating the
community of any potential issues with the current purge table operation.

[1] - https://github.com/apache/polaris/pull/3256#issuecomment-3725228243
[2] -
https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/AllEntriesTable.java

Go community,

Adam

On Fri, Jan 9, 2026 at 4:33 AM Pierre Laporte <[email protected]> wrote:

> Oh I see where the misunderstanding is.  At this time, what you are
> questioning is this sentence from the #2365 original bug report:
>
> > The base64 encoded full binary Iceberg manifest files are included in the
> task entities
>
> Now, consider the sentence that immediately follows:
>
> > Although some Java stream handling is being used, all manifest-files of
> the table to purge are materialized at once on the Java heap.
>
> This is indeed what my analysis is about, as it is what defines the memory
> consumption pattern of the current logic.  Because it is the cause of the
> reported heap pressure, it applies regardless of the final answer to the
> base64 question.
>
> --
>
> Pierre
>
>
> On Fri, Jan 9, 2026 at 5:58 AM Yufei Gu <[email protected]> wrote:
>
> > >
> > > The getMetadataFileBatches method has a space complexity of `O(PM + S +
> > ST
> > > + PST)`.
> > > Same thing for the getMetadataTaskStream method.
> > > The getManifestTaskStream method has a space complexity of `O(UM)`.
> > > The handleTask method has a space complexity of `O(UM + PM + S + ST +
> > PST +
> > > T)`
> >
> >
> > Pierre, thanks for the analysis. I think this response is addressing a
> > slightly different issue than the one originally raised. The initial
> > concern was that embedding base64 encoded manifest files causes excessive
> > heap usage. The new complexity breakdown instead argues that operations
> > scale with table history and activity. If the concern is really about the
> > cost of operating on large tables with long metadata history, that seems
> > like a broader, pre existing characteristic rather than a defect in this
> > specific approach. In that case, it may be clearer to track this as a
> > separate issue or PR. For reference, there were discussions of moving the
> > heavy workloads like this out of Polaris instance to a delegation
> service.
> >
> > Yufei
> >
> >
> > On Thu, Jan 8, 2026 at 2:04 AM Pierre Laporte <[email protected]>
> > wrote:
> >
> > > As far as I can tell, here is the space complexity for each method.
> The
> > > names used correspond to:
> > >
> > > * PM = number of previous metadata files
> > > * S = number of snapshots
> > > * ST = number of statistics files
> > > * PST = number of partition statistics files
> > > * UM = number of unique manifest files across all snapshots
> > > * T = total number of created TaskEntities
> > >
> > > The getMetadataFileBatches method has a space complexity of `O(PM + S +
> > ST
> > > + PST)`.
> > > Same thing for the getMetadataTaskStream method.
> > > The getManifestTaskStream method has a space complexity of `O(UM)`.
> > > The handleTask method has a space complexity of `O(UM + PM + S + ST +
> > PST +
> > > T)`
> > >
> > > Based on those elements, it is clear that the current implementation
> will
> > > run into heap pressure for tables with many snapshots and frequent
> > updates,
> > > or tables with long metadata history.
> > >
> > > On Wed, Jan 7, 2026 at 9:55 PM Yufei Gu <[email protected]> wrote:
> > >
> > > > Hi all,
> > > >
> > > > After taking a closer look, I am not sure the issue as currently
> > > described
> > > > is actually valid.
> > > >
> > > > The base64 encoded manifest objects[1] being discussed are not the
> > > manifest
> > > > files themselves. They are objects representing manifest files, which
> > can
> > > > be reconstructed from the manifest entries stored in the ManifestList
> > > > files. As a result, the in memory footprint should be roughly
> > equivalent
> > > to
> > > > the size of a single manifest list file per snapshot, plus some
> > > additional
> > > > base64 encoding overhead. That overhead does not seem significant
> > enough
> > > on
> > > > its own to explain large heap pressure.
> > > >
> > > > This pattern is also handled in practice today. For example, multiple
> > > Spark
> > > > procedures/actions and Spark planning process these manifest
> > > > representations within a single node, typically the driver, without
> > > > materializing full manifest files in memory. One concrete example is
> > > here:
> > > >
> > > >
> > >
> >
> https://github.com/apache/iceberg/blob/bf1074ff373c929614a3f92dd4e46780028ac1ca/spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteTablePathSparkAction.java#L290
> > > >
> > > > Given this, I am not convinced that embedding these manifest
> > > > representations is inherently problematic from a memory perspective.
> If
> > > > there are concrete scenarios where this still leads to excessive
> memory
> > > > usage, it would be helpful to clarify where the amplification happens
> > > > beyond what is already expected from manifest list processing.
> > > >
> > > > Happy to be corrected if I am missing something, but wanted to share
> > this
> > > > observation before we anchor further design decisions on this
> > assumption.
> > > >
> > > > 1.
> > > >
> > > >
> > >
> >
> https://github.com/apache/polaris/blob/c9efc6c1af202686945efe2e19125e8f116a0206/runtime/service/src/main/java/org/apache/polaris/service/task/TableCleanupTaskHandler.java#L194
> > > >
> > > > Yufei
> > > >
> > > >
> > > > On Tue, Jan 6, 2026 at 8:46 PM Yufei Gu <[email protected]>
> wrote:
> > > >
> > > > > Thanks Adam and Robert for the replies.
> > > > >
> > > > > Just to make sure I am understanding this correctly.
> > > > >
> > > > > The core issue the proposal is addressing is described in
> > > > > https://github.com/apache/polaris/issues/2365 that, today, the
> full
> > > > > binary Iceberg manifest files, base64 encoded, are embedded in the
> > task
> > > > > entities. As a result, when a purge runs, all manifests for a table
> > end
> > > > up
> > > > > being materialized in memory at once. This behavior creates
> > significant
> > > > > heap pressure and may lead to out of memory failures during purge
> > > > > operations.
> > > > >
> > > > > Please let me know if this matches your intent, or if I am missing
> > > > > anything.
> > > > >
> > > > > Yufei
> > > > >
> > > > >
> > > > > On Sat, Dec 20, 2025 at 4:53 AM Robert Stupp <[email protected]>
> wrote:
> > > > >
> > > > >> Hi all,
> > > > >>
> > > > >> Thanks Adam! You're spot on!
> > > > >>
> > > > >> I wanted to separate the discussions about this functionality and
> > the
> > > > >> related async & reliable tasks proposal.
> > > > >>
> > > > >> The functionality of this one is generally intended for long(er)
> > > > >> running operations against object stores, and already provides the
> > > > >> necessary functionality to fix the existing OOM issue.
> > > > >>
> > > > >> Robert
> > > > >>
> > > > >> [1]
> > https://lists.apache.org/thread/kqm0w38p7bnojq455yz7d2vdfp6ky1h7
> > > > >>
> > > > >> On Fri, Dec 19, 2025 at 3:43 PM Adam Christian
> > > > >> <[email protected]> wrote:
> > > > >> >
> > > > >> > Howdy Robert,
> > > > >> >
> > > > >> > I reviewed the PR and it is very clean. I really enjoy the
> > > simplicity
> > > > of
> > > > >> > the FileOperations interface. I think that's going to be a great
> > > > >> extension
> > > > >> > point.
> > > > >> >
> > > > >> > One of the things I was wondering about was what to do with the
> > > > current
> > > > >> > purge implementation. I understand that it has a high likelihood
> > of
> > > > >> having
> > > > >> > an Out of Memory exception [1]. I'm wondering if your idea was
> to
> > > > build
> > > > >> > this, then replace the current implementation. I'd love to
> ensure
> > > that
> > > > >> we
> > > > >> > have a plan for one clean, stable implementation.
> > > > >> >
> > > > >> > [1] - https://github.com/apache/polaris/issues/2365
> > > > >> >
> > > > >> > Go community,
> > > > >> >
> > > > >> > Adam
> > > > >> >
> > > > >> > On Tue, Dec 16, 2025 at 10:25 AM Adam Christian <
> > > > >> > [email protected]> wrote:
> > > > >> >
> > > > >> > > Hi Yufei,
> > > > >> > >
> > > > >> > > Great questions!
> > > > >> > >
> > > > >> > > From what I can see in the PR, here are the answers to your
> > > > questions:
> > > > >> > > 1. The first major scenario is improving the memory concerns
> > with
> > > > >> purge.
> > > > >> > > That's important to stabilize a core use case in the service.
> > > > >> > > 2. These are related specifically to file operations. I cannot
> > > see a
> > > > >> way
> > > > >> > > that it would be broader than that.
> > > > >> > >
> > > > >> > > Go community,
> > > > >> > >
> > > > >> > > Adam
> > > > >> > >
> > > > >> > > On Mon, Dec 15, 2025, 3:20 PM Yufei Gu <[email protected]>
> > > > wrote:
> > > > >> > >
> > > > >> > >> Hi Robert,
> > > > >> > >>
> > > > >> > >> Thanks for sharing the proposal and the PR. Before diving
> > deeper
> > > > >> into the
> > > > >> > >> API shape, I was hoping to better understand the intended use
> > > cases
> > > > >> you
> > > > >> > >> have in mind:
> > > > >> > >>
> > > > >> > >> 1. What concrete scenarios are you primarily targeting with
> > these
> > > > >> > >> long-running object store operations?
> > > > >> > >> 2. Are these mostly expected to be file/object-level
> > maintenance
> > > > >> tasks
> > > > >> > >> (e.g. purge, cleanup), or do you envision broader categories
> of
> > > > >> operations
> > > > >> > >> leveraging the same abstraction?
> > > > >> > >>
> > > > >> > >> Having a clearer picture of the motivating use cases would
> help
> > > > >> evaluate
> > > > >> > >> the right level of abstraction and where this should live
> > > > >> architecturally.
> > > > >> > >>
> > > > >> > >> Looking forward to the discussion.
> > > > >> > >>
> > > > >> > >> Yufei
> > > > >> > >>
> > > > >> > >>
> > > > >> > >> On Fri, Dec 12, 2025 at 3:48 AM Robert Stupp <[email protected]
> >
> > > > wrote:
> > > > >> > >>
> > > > >> > >> > Hi all,
> > > > >> > >> >
> > > > >> > >> > I'd like to propose an API and corresponding implementation
> > for
> > > > >> (long
> > > > >> > >> > running) object store operations.
> > > > >> > >> >
> > > > >> > >> > It provides a CPU and heap-friendly API and implementation
> to
> > > > work
> > > > >> > >> > against object stores. It is built in a way to provide
> > > > "pluggable"
> > > > >> > >> > functionality. What I mean is this (Java pseudo code):
> > > > >> > >> > ---
> > > > >> > >> > FileOperations fileOps =
> > > > >> > >> > fileOperationsFactory.createFileOperations(fileIoInstance);
> > > > >> > >> > Stream<FileSpec> allIcebergTableFiles = fileOps.
> > > > >> > >> >     identifyIcebergTableFiles(metadataLocation);
> > > > >> > >> > PurgeStats purged = fileOps.purge(allIcebergTableFiles);
> > > > >> > >> > // or simpler:
> > > > >> > >> > PurgeStats purged =
> > > fileOps.purgeIcebergTable(metadataLocation);
> > > > >> > >> > // or similarly for Iceberg views
> > > > >> > >> > PurgeStats purged =
> > fileOps.purgeIcebergView(metadataLocation);
> > > > >> > >> > // or to purge all files underneath a prefix
> > > > >> > >> > PurgeStats purged =
> fileOps.purge(fileOps.findFiles(prefix));
> > > > >> > >> > ---
> > > > >> > >> >
> > > > >> > >> > Not mentioned in the pseudo code is the ability to
> rate-limit
> > > the
> > > > >> > >> > number of purged files or batch-deletions and configure the
> > > > >> deletion
> > > > >> > >> > batch-size.
> > > > >> > >> >
> > > > >> > >> > The PR already contains tests against an on-heap object
> store
> > > > mock
> > > > >> and
> > > > >> > >> > integration tests against S3/GCS/Azure emulators.
> > > > >> > >> >
> > > > >> > >> > More details can be found in the README [2] included in the
> > PR
> > > > and
> > > > >> of
> > > > >> > >> > course in the code in the PR.
> > > > >> > >> >
> > > > >> > >> > Robert
> > > > >> > >> >
> > > > >> > >> > [1] https://github.com/apache/polaris/pull/3256
> > > > >> > >> > [2]
> > > > >> > >> >
> > > > >> > >>
> > > > >>
> > > >
> > >
> >
> https://github.com/snazy/polaris/blob/obj-store-ops/storage/files/README.md
> > > > >> > >> >
> > > > >> > >>
> > > > >> > >
> > > > >>
> > > > >
> > > >
> > >
> >
>

Re: [PROPOSAL] Object store functionality

Reply via email to