Hi Srinivas,

Thanks for the discussion recap! It's very useful to keep the dev thread
and meetings aligned.

Option 1:
Credential Rotation: Highly efficient. Because the configuration is
referenced by ID, rotating a cloud IAM role or secret requires updating
only the single StorageConfiguration entity. [...]


This seems to imply that credentials are stored as part of the Storage
Configuration Entity. If so, I do not think this approach is ideal. I
believe the secret data should ideally be accessed via the Secrets Manager
[1]. While that discussion is still in progress, I believe it interconnects
with this proposal.

[...] All thousands of downstream
tables referencing it would immediately use the new credentials without
metadata updates.


Immediacy is probably from the end-user's perspective. Internally,
different Polaris processes may switch to the updated config at
different moments in time... I do not think it is a problem in this case,
just wanted to highlight it to make sure distributed system aspects are not
left out :)

Option 2:
Credential Rotation: Credential rotation is difficult [...]


Again, I believe actual credentials should be accessed via the Secrets
Manager [1] so some indirection will be present.

Config updates will need to happen individually in each case, but actual
secrets could be shared and updated centrally via the Secrets Manager.

ATM, given the complexity points about option 1 that were brought up in the
community sync, I tend to favour this option for implementing this
proposal. However, this is not a strong requirement by any means, just my
personal opinion. Other opinions are welcome.

Depending on how secret references are handled in code (needs a POC, I
guess), there could be some synergy with Tornike's approach from [3699].

Option 3: Named Catalog-Level Configurations (Hybrid) [...]


I would like to clarify the UX story in this case. Do we expect end users
to manage Storage Configuration in this case or the Polaris owner?

In the latter case, it seems similar to Tornike's proposal in [3699] but
generalized to all storage types. The Polaris Admin / Owner could use a
non-public API to work with this configuration (e.g. plain Quarkus
configuration or possibly Admin CLI).

Option 4: Leverage Existing Policy Framework [...]


I tend to agree with the "semantic confusion" point.

It should be fine to reuse policy-related code in the implementation (if
possible), but I believe Storage Configuration and related credential
management form a distinct use case / feature and deserve dedicated
handling in Polaris and the API / UX level.

[1] https://lists.apache.org/thread/68r3gcx70f0qhbtz3w4zhb8f9s4vvw1f

[3699] https://github.com/apache/polaris/pull/3699

Thanks,
Dmitri.

On Tue, Feb 10, 2026 at 10:19 PM Srinivas Rishindra <[email protected]>
wrote:

> Hi Everyone,
>
> We had an opportunity to discuss this feature and my recent proposal at
> the last community sync meeting. I would like to summarize our  discussion
> and enumerate the various options we considered to help us reach a
> consensus.
>
> To recap, storage configuration is currently restricted at the catalog
> level. This limits flexibility for users who need to organize tables across
> different storage configurations or cloud providers within a single
> catalog. There appears to be general agreement on the utility of this
> feature; however, we still need to align on the specific implementation
> approach.
>
> Here are the various options that were considered.
> *Option 0: Make Credentials available as part of table properties. *(This
> was my original proposal, but abandoned after becoming aware of the
> security implications.)
>
> *Option 1: First-Class Storage Configuration Entity *
>
> This approach proposes elevating StorageConfiguration to a standalone,
> top-level resource in the Polaris backend (similar to a Principal,
> Namespace or Table), independent of the Catalog or Table. This is the
> approach in my most recent proposal doc.
> -
>
> Data Model: A new StorageConfiguration entity is created with its own
> unique identifier and lifecycle. Tables and Namespaces would store a
> reference ID pointing to this entity rather than embedding the credentials
> directly.
> -
>
> Security: This model offers the cleanest security boundary. We can
> introduce a specific USAGE privilege on the configuration entity. A user
> would need both CREATE_TABLE on the Namespace *and* USAGE on the specific
> StorageConfiguration to link them.
> -
>
> Credential Rotation: Highly efficient. Because the configuration is
> referenced by ID, rotating a cloud IAM role or secret requires updating
> only the single StorageConfiguration entity. All thousands of downstream
> tables referencing it would immediately use the new credentials without
> metadata updates.
> -
>
> Inheritance: The reference could be set at the Catalog, Namespace, or Table
> level. If a Table does not specify a reference, it would inherit the
> reference from its parent Namespace (and so on), preserving the current
> hierarchical behavior while adding granularity.
>
> • Pros: Maximum flexibility and reusability (Many-to-Many). Updating one
> config object propagates to all associated tables.
> -
>
> • Cons: Highest engineering cost. Requires new CRUD APIs, DB schema changes
> (mapping tables), and complex authorization logic (two-stage auth checks).
> Risk of accumulating "orphaned" configs
>
> Option 2: The "Embedded Field" Model
> -
>
> This approach extends the existing Table and Namespace entities to include
> a storageConfig field. The parameter can be defaulted to 'null' and use
> parent's storageConfig at runtime.
>
> *Data Model:* No new top-level entity is created. The storage details
> (e.g., roleArn) are stored directly into a new, dedicated column or
> structure within the existing Table/Namespace entity.
>
> Complexity: This could reduce the engineering overhead significantly. There
> are no new CRUD endpoints for configuration objects, no referential
> integrity checks (e.g., preventing the deletion of a config used by active
> tables).
>
> Credential Rotation: Credential rotation is difficult. If an IAM role
> changes, an administrator must identify and issue UPDATE operations for
> every individual table or namespace that uses that specific configuration,
> potentially affecting thousands of objects.
>
> • Pros: Lowest engineering cost. No new entities or complex mappings are
> required. Easy to reason about authorization (auth is tied strictly to the
> entity).
>
> • Cons: No reusability. Configs must be duplicated across tables; rotating
> credentials for 1,000 tables could require 1,000 update calls.
>
> Option 3: Named Catalog-Level Configurations (Hybrid)
>
> This can be a combination of Option1 and Option 2
> Admin can define a registry of "Named Storage Configurations" stored within
> the Catalog. Sub-entities (Namespaces/Tables) reference these configs by
> name (e.g., storage-config: "finance-secure-role").
>
> *Data Model:* No separate top level entity is created. The Catalog Entity
> potentially needs to be modified to accommodate named storage
> configurations.
>
> Credential Rotation: Credential Rotation can be done at the catalog level
> for each named Storage Configuration.
>
> Inheritance: Works pretty much similar as proposed in option 1 & option2.
>
> Security: Not as secure as option1 but still useful. A principal with
> proper access can attach any named storage configuration defined at the
> catalog level to any arbitrary entity within the catalog.
>
> • Pros: Good balance of reusability and simplicity. Allows updating a
> config in one place (the Catalog definition) without needing a full-blown
> global entity system.
>
> • Cons: Scope is limited to the Catalog (cannot share configs across
> catalogs)
> Option 4: Leverage Existing Policy Framework
>
> This approach leverages the existing Apache Polaris Policy Framework
> (currently used for features like snapshot expiry) to manage storage
> settings.
>
> Data Model: Storage configurations are defined as "Policies" at the Catalog
> level. These Policies contain the credential details and can be attached to
> Namespaces or Tables using the existing policy attachment APIs.
>
> Inheritance:  This aligns naturally with Polaris's existing architecture,
> where policies cascade from Catalog → Namespace → Table. The vending logic
> would simply resolve the "effective" storage policy for a table at query
> time.
>
> Security: This utilizes the existing Polaris Privileges and attachment
> privileges. Administrators can define authorized storage policies
> centrally, and users can only select from these pre-approved policies,
> preventing them from inputting arbitrary or insecure role ARNs.
>
> • Pros:
>   . Zero New Infrastructure: Reuses the existing "Policy" entity,
> persistence layer, and inheritance logic, significantly reducing
> engineering effort
>   . Proven Inheritance: The logic for resolving policies from child to
> parent is already implemented and tested
>
> • Cons:
>   . Semantic Confusion: Policies are typically used for "governance rules"
> (e.g., snapshot expiry, compaction) rather than "connectivity
> configuration." Using them for credentials might be unintuitive
>   . Authorization Complexity: The authorizer would need to load and
> evaluate policies to determine how to access data, potentially coupling
> governance logic with data access paths
>
> We can potentially start with one of the options initially and as the
> feature and user needs develop we can migrate to other options as well.
> Please let me know your thoughts about the various options above or if on
> anything that I might have missed so that we can work towards a consensus
> on how to implement this feature.
>
>
> On Thu, Feb 5, 2026 at 8:08 AM Tornike Gurgenidze <[email protected]>
> wrote:
>
> > Hi,
> >
> > To follow up on Dmitri's point about credentials, there's already a PR
> > <https://github.com/apache/polaris/pull/3409> up that is going to allow
> > predefining named storage credentials in polaris config like the
> following:
> >
> >    - polaris.storage.aws.<storage-name>.access-key
> >    - polaris.storage.aws.<storage-name>.secret-key
> >
> > then storage configuration will simply refer to it by name and
> > inherit credentials.
> >
> > I think that can go hand in hand with table-level overrides. Overriding
> > each and every aws property for every table doesn't sound ideal.
> Defining a
> > storage configuration upfront and referring to it by name should be a
> > simpler solution. I can extend the scope of the PR above to allow
> > predefining other aws properties as well like endpoint-url and region.
> >
> > Another point that came up in the discussion surrounding extra
> credentials
> > is how to make sure anyone can't just hijack pre configured credentials.
> > The simplest solution I see there is to ship off properties to OPA during
> > catalog (and table) creation and allow users to write policies based on
> > them. If we want to enable internal rbac to have a similar capability we
> > can go further and move from config based storage definition to a
> separate
> > `/storage-config` rest resource in management API that will come with
> > necessary grants and permissions.
> >
> > On Thu, Feb 5, 2026 at 5:43 AM Dmitri Bourlatchkov <[email protected]>
> > wrote:
> >
> > > Hi Srinivas,
> > >
> > > Thanks for the proposal. It looks good to me overall, a very timely
> > feature
> > > to add to Polaris.
> > >
> > > I added some comments in the doc and I see this topic on the Community
> > Sync
> > > agenda for Feb 5. Looking forward to discussing it online.
> > >
> > > I have three points to highlight:
> > >
> > > * Dealing with passwords probably connects to the Secrets Manager
> > > discussion [1]
> > >
> > > * Persistence needs to consider non-RDBMS backends. OSS code has both
> > > PostgreSQL and MongoDB, but private Persistence implementations are
> > > possible too. I believe we need a proper SPI for this, not just a
> > > relational schema example.
> > >
> > > * Associating entities (tables, namespaces) to Storage Configuration is
> > > likely a plugin point that downstream projects may want to customize.
> I'd
> > > propose making another SPI for this. This SPI is probably different
> from
> > > the new Persistence SPI mentioned above since the concern here is not
> > > persistence per se, but the logic of finding the right storage config.
> > >
> > > [1] https://lists.apache.org/thread/68r3gcx70f0qhbtz3w4zhb8f9s4vvw1f
> > >
> > > Cheers,
> > > Dmitri.
> > >
> > > On Mon, Feb 2, 2026 at 4:18 PM Srinivas Rishindra <
> > [email protected]>
> > > wrote:
> > >
> > > > Hi all,
> > > >
> > > > We had an opportunity to discuss the community sprint last week.
> Based
> > on
> > > > that discussion, I have created a new design doc which I am attaching
> > > here.
> > > > In this design instead of passing credentials via table properties,
> > this
> > > > design introduces Inheritable Storage Configurations as a first-class
> > > > feature. Please let me know your thoughts on the document.
> > > >
> > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1hbDkE-w84Pn_112iW2vCnlDKPDtyg8flaYcFGjvD120/edit?usp=sharing
> > > >
> > > >
> > > > On Mon, Jan 26, 2026 at 10:42 PM Yufei Gu <[email protected]>
> > wrote:
> > > >
> > > > > Hi Srinivas,
> > > > >
> > > > > Thanks for sharing this proposal. Persisting long lived credentials
> > > such
> > > > as
> > > > > an S3 secret access key directly in table properties raises
> > significant
> > > > > security concerns. Here is an alternative approach previously
> > > discussed,
> > > > > which enables storage configuration at the table or namespace
> level,
> > > and
> > > > it
> > > > > is probably a more secure and promising direction overall.
> > > > >
> > > > > Yufei
> > > > >
> > > > >
> > > > > On Mon, Jan 26, 2026 at 8:18 PM Srinivas Rishindra <
> > > > [email protected]
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > Dear All,
> > > > > >
> > > > > > I have developed a design proposal for Table-Level Storage
> > Credential
> > > > > > Overrides in Apache Polaris.
> > > > > >
> > > > > > The core objective is to allow specific storage properties to be
> > > > defined
> > > > > at
> > > > > > the table level rather than the catalog level, enabling a single
> > > > logical
> > > > > > catalog to support tables across disparate storage systems.
> > > Crucially,
> > > > > the
> > > > > > implementation ensures these overrides participate in the
> > credential
> > > > > > vending process to maintain secure, scoped access.
> > > > > >
> > > > > > I have also implemented a Proof of Concept (POC) pull request to
> > > > > > demonstrate the idea. While the current MVP focuses on S3, I
> intend
> > > to
> > > > > > expand scope to include Azure and GCS pending community feedback.
> > > > > >
> > > > > > I look forward to your thoughts and suggestions on this proposal.
> > > > > >
> > > > > > Links:
> > > > > >
> > > > > > - Design Doc: Table-Level Storage Credential Overrides (
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1tf4N8GKeyAAYNoP0FQ1zT1Ba3P1nVGgdw3nmnhSm-u0/edit?usp=sharing
> > > > > > )
> > > > > > - POC PR: https://github.com/apache/polaris/pull/3563 (
> > > > > > https://github.com/apache/polaris/pull/3563)
> > > > > >
> > > > > > Best regards,
> > > > > >
> > > > > > Srinivas Rishindra Pothireddi
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to