Re: [DISCUSS] Refreshing storage credentials for staged table creation

Maninder Parmar Wed, 18 Mar 2026 11:36:10 -0700

Thanks for the feedback during the community sync!

I would summarize the key discussions and decisions:


   - Omit the requirement to support credential refresh on the loadTable
   API. The storage credentials should be refreshable only on the
   loadCredentials endpoint.
   - The spec will now move towards the prototype phase to ensure
   downstream implementation risks are minimized. In particular:
      - Understanding the implications of using config map within
      StorageCredentials or creating a separate typed field
      - Prototyping would be limited to S3 which is the most complex
      storage provider implementation and have the required
      VendedCredentialProvider implementation that should be extended
   - There was also a discussion to support sending fresh storage
   credentials as part of the commit table API. It is out of scope for this
   effort and Daniel Weeks will send a PR for it.


On Tue, Mar 17, 2026 at 5:41 PM Maninder Parmar <
[email protected]> wrote:

> Hello community!
>
> I have updated the proposal
> <https://docs.google.com/document/d/1R1K6X7qYqvIFkPG3m1neV5Mvy8rwWJvhSFr8DgJgQ-E/edit?tab=t.0>
>  and
> the PR #15280 <https://github.com/apache/iceberg/pull/15280> based on
> the feedback during the last catalog community sync. This ensures that all
> the requirements surfaced so far are being handled in the new proposal.
> Please take some time to provide feedback.
>
> To summarize in the thread, the key requirements for the proposal to
> satisfy are the following:
>
>    - *Generalizability : *The storage refresh mechanism should NOT be a
>    staging only concept but instead should integrate with existing
>    StorageCredential mechanism and be reusable across any credential vending
>    scenarios (staged tables, committed tables, scan planning etc.).
>    - *Works without loadCredential API support*: Not all catalog
>    implementations support the loadCredentials endpoint. The storage refresh
>    mechanism must work with the existing loadTable endpoint.
>    - *No server side state requirement*: The spec should not mandate
>    maintaining server side state.
>    - *Per credential refresh granularity*: Each storage credential should
>    be independently refreshable. StorageCredentials allows specifying a set of
>    locations, each of them should be refreshable independently.
>
>
> Thanks,
> Maninder
>
> On Wed, Feb 25, 2026 at 5:33 PM Maninder Parmar <
> [email protected]> wrote:
>
>> Hi community,
>>
>> Thanks for the inputs during the catalog sync! I want to summarize the
>> decisions and direction that was agreed on during the sync.
>>
>> *Direction*
>> - We'll introduce a storage-refresh-token concept that integrates with
>> the existing StorageCredential mechanism rather than being a
>> staging-specific construct. This keeps the design reusable across different
>> APIs going forward.
>> - We agreed not to model this after the planId-based credential vending
>> used in scan planning. The community is open to refactoring planId
>> credential refresh to use the storage credential refresh token pattern in
>> the future.
>>
>> *Discarded approaches*
>> 1. table-uuid as the identifier - overloads a spec-level identifier for a
>> purpose it wasn't designed for
>> 2. Server-side state / sessions - adds operational complexity and some
>> existing catalog implementations assume stateless staged table creation
>> 3. Overloading OAuth scopes - conflates storage credential refresh with
>> the OAuth layer
>>
>> I will share an updated design doc and spec PR reflecting this direction.
>>
>> On Tue, Feb 10, 2026 at 11:14 AM Maninder Parmar <
>> [email protected]> wrote:
>>
>>> Thanks for reviewing the proposal Huaxin!
>>>
>>> *"Since stagingSession is in the URL and may show up in logs, should it
>>> be treated as a secret token (hard to guess, short expiry)?"*
>>> No, stagingSession is not a secret it is just an identifier for the
>>> session. It is up to the catalog server implementation if it wants to
>>> enforce if only the user who was issued the stagingSession or any user
>>> with staginSession should call commit on the table. It can use existing
>>> authentication mechanisms to enforce those constraints.
>>>
>>> *"If it leaks, can someone else use it, or is it restricted to the same
>>> user/job that created the staged table?"*
>>> Since it's not a secret but merely an identifier (just like planId)
>>> there should not be a risk of leak. It's up to catalog server
>>> implementation to restrict same user/job or not.
>>>
>>>
>>> *"What happens if a CTAS job crashes or is cancelled after staging? Does
>>> the stagingSession expire automatically, and is there a way to clean
>>> up/abort the staged create?"*The lifecycle implementation of
>>> stagingSession is up to the catalog servers. There are multiple strategies
>>> that could be used here like automatically expiring the session after a few
>>> hours if no updateTable call was made for that session or expiring active
>>> sessions when one of them is committed etc.
>>> There would not be any additional API surface area exposed to clients to
>>> manage the session lifecycle, it is the responsibility of the catalog
>>> server.
>>>
>>> Let me know if you have follow up questions.
>>>
>>>
>>> On Mon, Feb 9, 2026 at 7:07 PM huaxin gao <[email protected]>
>>> wrote:
>>>
>>>> Hi Maninder,
>>>>
>>>> Thanks for the proposal! It sounds like a good direction to me.
>>>> Returning a stagingSession from stage-create and then reusing it for
>>>> loadCredentials/loadTable feels consistent with the existing planId
>>>> pattern, and it fixes a real CTAS problem.
>>>>
>>>> A few questions:
>>>>
>>>> Since stagingSession is in the URL and may show up in logs, should it
>>>> be treated as a secret token (hard to guess, short expiry)?
>>>>
>>>> If it leaks, can someone else use it, or is it restricted to the same
>>>> user/job that created the staged table?
>>>>
>>>> What happens if a CTAS job crashes or is cancelled after staging? Does
>>>> the stagingSession expire automatically, and is there a way to clean
>>>> up/abort the staged create?
>>>>
>>>> Would love to hear your thoughts on these.
>>>>
>>>> Thanks,
>>>>
>>>> Huaxin
>>>>
>>>> On Mon, Feb 9, 2026 at 4:30 PM Maninder Parmar <
>>>> [email protected]> wrote:
>>>>
>>>>> Hello iceberg community!
>>>>>
>>>>> I wanted to discuss the proposal for refreshing storage credentials
>>>>> for staged table creation. The iceberg tables could be created either via
>>>>> single step creation flow or a two step staged creation flow which is used
>>>>> for implementing CTAS (Create table as select) statements. Currently, it's
>>>>> not possible to refresh the credentials for staged tables since they are
>>>>> not committed on the catalog and hence not visible to loadTable or
>>>>> credential endpoint.
>>>>> There has been prior discussion
>>>>> <https://lists.apache.org/thread/q5n355d89nxbhywtlv3qhq7dchbyb67d> where
>>>>> the community members have expressed the need for supporting this 
>>>>> scenario.
>>>>>
>>>>> I have started a proposal
>>>>> <https://docs.google.com/document/d/1R1K6X7qYqvIFkPG3m1neV5Mvy8rwWJvhSFr8DgJgQ-E/edit?tab=t.0>
>>>>>  to
>>>>> flush out the details to support this scenario building on the
>>>>> precedence of credential vending support for scan planning.
>>>>> The OpenAPI changes can be seen in PR #15280
>>>>> <https://github.com/apache/iceberg/pull/15280>
>>>>>
>>>>> Looking forward to your feedback.
>>>>>
>>>>> Thanks,
>>>>> Maninder
>>>>>
>>>>>  Proposal: Credential Refresh for Staged Table Creation
>>>>> <https://drive.google.com/open?id=1R1K6X7qYqvIFkPG3m1neV5Mvy8rwWJvhSFr8DgJgQ-E>
>>>>>
>>>>

Re: [DISCUSS] Refreshing storage credentials for staged table creation

Reply via email to