Hello community!

I have updated the proposal
<https://docs.google.com/document/d/1R1K6X7qYqvIFkPG3m1neV5Mvy8rwWJvhSFr8DgJgQ-E/edit?tab=t.0>
and
the PR #15280 <https://github.com/apache/iceberg/pull/15280> based on
the feedback during the last catalog community sync. This ensures that all
the requirements surfaced so far are being handled in the new proposal.
Please take some time to provide feedback.

To summarize in the thread, the key requirements for the proposal to
satisfy are the following:

   - *Generalizability : *The storage refresh mechanism should NOT be a
   staging only concept but instead should integrate with existing
   StorageCredential mechanism and be reusable across any credential vending
   scenarios (staged tables, committed tables, scan planning etc.).
   - *Works without loadCredential API support*: Not all catalog
   implementations support the loadCredentials endpoint. The storage refresh
   mechanism must work with the existing loadTable endpoint.
   - *No server side state requirement*: The spec should not mandate
   maintaining server side state.
   - *Per credential refresh granularity*: Each storage credential should
   be independently refreshable. StorageCredentials allows specifying a set of
   locations, each of them should be refreshable independently.


Thanks,
Maninder

On Wed, Feb 25, 2026 at 5:33 PM Maninder Parmar <
[email protected]> wrote:

> Hi community,
>
> Thanks for the inputs during the catalog sync! I want to summarize the
> decisions and direction that was agreed on during the sync.
>
> *Direction*
> - We'll introduce a storage-refresh-token concept that integrates with the
> existing StorageCredential mechanism rather than being a staging-specific
> construct. This keeps the design reusable across different APIs going
> forward.
> - We agreed not to model this after the planId-based credential vending
> used in scan planning. The community is open to refactoring planId
> credential refresh to use the storage credential refresh token pattern in
> the future.
>
> *Discarded approaches*
> 1. table-uuid as the identifier - overloads a spec-level identifier for a
> purpose it wasn't designed for
> 2. Server-side state / sessions - adds operational complexity and some
> existing catalog implementations assume stateless staged table creation
> 3. Overloading OAuth scopes - conflates storage credential refresh with
> the OAuth layer
>
> I will share an updated design doc and spec PR reflecting this direction.
>
> On Tue, Feb 10, 2026 at 11:14 AM Maninder Parmar <
> [email protected]> wrote:
>
>> Thanks for reviewing the proposal Huaxin!
>>
>> *"Since stagingSession is in the URL and may show up in logs, should it
>> be treated as a secret token (hard to guess, short expiry)?"*
>> No, stagingSession is not a secret it is just an identifier for the
>> session. It is up to the catalog server implementation if it wants to
>> enforce if only the user who was issued the stagingSession or any user
>> with staginSession should call commit on the table. It can use existing
>> authentication mechanisms to enforce those constraints.
>>
>> *"If it leaks, can someone else use it, or is it restricted to the same
>> user/job that created the staged table?"*
>> Since it's not a secret but merely an identifier (just like planId) there
>> should not be a risk of leak. It's up to catalog server implementation to
>> restrict same user/job or not.
>>
>>
>> *"What happens if a CTAS job crashes or is cancelled after staging? Does
>> the stagingSession expire automatically, and is there a way to clean
>> up/abort the staged create?"*The lifecycle implementation of
>> stagingSession is up to the catalog servers. There are multiple strategies
>> that could be used here like automatically expiring the session after a few
>> hours if no updateTable call was made for that session or expiring active
>> sessions when one of them is committed etc.
>> There would not be any additional API surface area exposed to clients to
>> manage the session lifecycle, it is the responsibility of the catalog
>> server.
>>
>> Let me know if you have follow up questions.
>>
>>
>> On Mon, Feb 9, 2026 at 7:07 PM huaxin gao <[email protected]> wrote:
>>
>>> Hi Maninder,
>>>
>>> Thanks for the proposal! It sounds like a good direction to me.
>>> Returning a stagingSession from stage-create and then reusing it for
>>> loadCredentials/loadTable feels consistent with the existing planId
>>> pattern, and it fixes a real CTAS problem.
>>>
>>> A few questions:
>>>
>>> Since stagingSession is in the URL and may show up in logs, should it be
>>> treated as a secret token (hard to guess, short expiry)?
>>>
>>> If it leaks, can someone else use it, or is it restricted to the same
>>> user/job that created the staged table?
>>>
>>> What happens if a CTAS job crashes or is cancelled after staging? Does
>>> the stagingSession expire automatically, and is there a way to clean
>>> up/abort the staged create?
>>>
>>> Would love to hear your thoughts on these.
>>>
>>> Thanks,
>>>
>>> Huaxin
>>>
>>> On Mon, Feb 9, 2026 at 4:30 PM Maninder Parmar <
>>> [email protected]> wrote:
>>>
>>>> Hello iceberg community!
>>>>
>>>> I wanted to discuss the proposal for refreshing storage credentials for
>>>> staged table creation. The iceberg tables could be created either via
>>>> single step creation flow or a two step staged creation flow which is used
>>>> for implementing CTAS (Create table as select) statements. Currently, it's
>>>> not possible to refresh the credentials for staged tables since they are
>>>> not committed on the catalog and hence not visible to loadTable or
>>>> credential endpoint.
>>>> There has been prior discussion
>>>> <https://lists.apache.org/thread/q5n355d89nxbhywtlv3qhq7dchbyb67d> where
>>>> the community members have expressed the need for supporting this scenario.
>>>>
>>>> I have started a proposal
>>>> <https://docs.google.com/document/d/1R1K6X7qYqvIFkPG3m1neV5Mvy8rwWJvhSFr8DgJgQ-E/edit?tab=t.0>
>>>>  to
>>>> flush out the details to support this scenario building on the
>>>> precedence of credential vending support for scan planning.
>>>> The OpenAPI changes can be seen in PR #15280
>>>> <https://github.com/apache/iceberg/pull/15280>
>>>>
>>>> Looking forward to your feedback.
>>>>
>>>> Thanks,
>>>> Maninder
>>>>
>>>>  Proposal: Credential Refresh for Staged Table Creation
>>>> <https://drive.google.com/open?id=1R1K6X7qYqvIFkPG3m1neV5Mvy8rwWJvhSFr8DgJgQ-E>
>>>>
>>>

Reply via email to