Re: Context-Aware Functions for Apache Polaris

Robert Stupp Tue, 20 May 2025 09:43:49 -0700

As I mentioned in my previous reply:

"FGAC is a very complex topic. The right way would be to have a holisticdesign and agreed-on approach. That does take time."

The proposed approach changes the observed behavior. It makes itimpossible to change the view later on. Plus the other concerns I mentioned.

I really want to get FGAC into Polaris - but well thought through andinteroperable - considering all use cases and requirements.

Let's work together on a proper design, but please let's not start withpartial implementations.


On 20.05.25 18:27, Prashant Singh wrote:

Hey Robert,
I believe you are quoting Iceberg view spec :
https://iceberg.apache.org/view-spec/#versions
  1. All representations for a version should express the same underlying
definition (This holds true )
  2. Immutable View versions : if the concern is that we are using the same
view version, we can always generate a polaris generated view version, and
include these representation, this is an implementation detail
Note : The spec doesn't say who can generate the view representation as
very well you can do it with JAVA api, so IMHO we are not in violation of
spec if we create a new view version.

  the approach is blindly changing some string without any knowledge about

the actual meaning.
I think I clearly called out it's a *POC* in my pr if that's what is being
quoted as the end solution, I am happy to work rough edges, though I think
if you strictly define your return type as boolean you can hold the
accountability to the view definer
if this string match leads to broken user experience, I would request to
objectively evaluate this idea of resolving the identities in the Polaris
side that's all I really wanna request for as its a very unorganized world
to unify spark's current_user() to
is_prinicpal() in polaris.

I hope this answers your concern. I am totally open to any recommendation
and work rough edges, let's solve this problem, together as a community !

Best,
Prashant Singh

On Tue, May 20, 2025 at 9:06 AM Robert Stupp <[email protected]> wrote:

This proposal _does_ change the view definition - it returns a
_different_ representation than the one that has been stored before.
This is a change that breaks the contract of the specification and it
changes the observed behavior.

FGAC is a very complex topic. The right way would be to have a holistic
design and agreed-on approach. That does take time.

On 20.05.25 16:47, Eric Maynard wrote:

I wouldn’t say that Polaris is changing a view definition, but per my
understanding Polaris is actually generating a view based on a Policy.

We will need a way for Polaris to embed some information into these

views.

I don’t think this is a P0 to make FGAC work, and I don’t think this
necessarily needs to take the form of a SQL function. For example, it

could

be through a policy like:

{
    “allow_columns”: [“a”, “b”],
    “transform_columns”: {
      {
        “col”: “c”,
        “predicate”: “some_func(x)”},
      {
        “col”: “d”,
        “predicate: “${current_user} == admin”
    }
}

Perhaps we can try to get the first iteration of FGAC (with only field
“allow_columns”) out first. Then, we can implement the engine-driven
predicates (like that on column “c”). Finally, we can examine options for
catalog-driven predicates like the one on column “d”.

—EM

On Tue, May 20, 2025 at 9:44 AM Robert Stupp <[email protected]> wrote:

I don't think that Polaris should change any view definition in any way,
but this is what the proposed approach does.

The approach breaks the contract (behavior defined by the specification)
and in turn the observed behavior, absolute no go's.

Practically speaking, the approach is blindly changing some string
without any knowledge about the actual meaning. But it has to know
exactly what it's doing - and to do that it has to know all the SQL
dialects.

As a side note: every query engine already has information about the
user. I'm definitely not supporting exposing any authZ related

information.

FGAC as a feature is a great thing to have. But the proposed approach is
not the right way.


On 19.05.25 20:33, Prashant Singh wrote:

Hey Robert,

Thank you for your honest feedbacks, please let me try answering your
concerns :

There are tons of SQL dialects out there, each requires its own fully

implemented lexer/parser/interpreter
That's true and we are not interpreting it either, we are just

replacing

the sql text wherever there is `is_principal('<principal_name>')` with

the

value of TRUE and FALSE from the server end
we are not re-interpreting or parsing the tree, i am assuming this is

what

Analyzer already does in constant folding and boolean simplification,

but

yes post parsing, but IMHO i don't think it's an impossible thing to
achieve. If it helps i am even fine is wrapping this as
`{{is_principal('<principal_name>')}}` to make this very specific and

let

only Polaris work, IMHO we can work out the rough edges with the
replacement.

for view containing view

This should not be a problem, as we just replace the text of the

current

view definition when it comes to resolve the nested view it will issue

the

same call of LOAD view but with the nested view identifier, when it

will

be

the call of nested view and that's when i we will do the replace
we don't open the nested view in the definition during the loadView of

the

parent, if that's the concern here, the nested view is treated

equivalent

to any other identifier which is opened / interpreted at later state of
execution.

Exposing authZ information via any kind of publicly accessible API to

every user sounds like an interesting source of information -

especially

for the "not so good and nice guys".

Yes that's true and that's my intention it's just how we are delivering

the

info, i.e i expose it by view definition itself (or by any other entity
stored in Polaris) , but exposing this as an API would require engine

side

integration too, which we as catalog have a very less control over as a
catalog.

What's the benefit over having the ACLs on the table/view defined in

the

intended way?

It's more from feature parity perspective and giving more control on

view

rather than just ACL (which are conjunctions) for ex if we just

complicate

the view def with more predicated for ex disjunction

select * from ns1.layer1_table where (condition1) OR
(is_principal_role('ANALYST'))

I would love to get your further feedback, considering the above.


Best,
Prashant Singh




On Mon, May 19, 2025 at 11:04 AM Robert Stupp <[email protected]> wrote:

I'm brutally honest here:

I think we should really stay away from interpreting SQL or any other
kind of (view) definition in Polaris. There are tons of SQL dialects

out

there, each requires its own fully implemented

lexer/parser/interpreter

- plus views-in-views-in-views-in-views... constructs requiring
resolution of nested views. It eventually ends in implementing
yet-another-query-engine. I doubt that this is doable with a
"java.lang.String.replace(from, to)" approach.

Exposing authZ information via any kind of publicly accessible API to
every user sounds like an interesting source of information -

especially

for the "not so good and nice guys".

Regarding the examples: what's the benefit over having the the ACLs on
the table/view defined in the intended way?

On 19.05.25 19:26, Prashant Singh wrote:

Hi everyone,

I’d like to propose adding *context-aware functions* to Apache

Polaris

so

that view definitions can resolve security context on the Polaris

side

(aka

catalog end without depending on engines).

*Proposed functions*

       1.

       *is_principal('<principal_name>')* – returns TRUE if the

authenticated

       principal matches <principal_name>, otherwise FALSE.
       2.

       *is_principal_role('<principal_role_name>')* – returns TRUE

when

       <principal_role_name> appears in the principal’s role set.
       3.

       *is_catalog_role('<catalog_role_name>')* – analogous check at

the

       catalog-role level.

*Why it matters*

These predicates make views dynamic. Example:

CREATE VIEW dynamic_vw ASSELECT *FROM ns1.layer1_tableWHERE
is_principal_role('ANALYST');

When a user whose one of principal roles include *ANALYST* calls LOAD
VIEW, Polaris rewrites the view to


       -

       SELECT * FROM ns1.layer1_table WHERE TRUE;


For everyone else the view becomes

       -

       SELECT * FROM ns1.layer1_table WHERE FALSE;


The result is better and consistent control of the identity

resolution

without relying on the engine side changes and giving polaris more
authority in enforcing things like FGAC (WIP by me).
Note the same can be extrapolated to any Polaris stored entity.

*Proof of concept*

I’ve put together a quick POC branch:

https://github.com/apache/polaris/compare/main...singhpk234:polaris:dyanmic/view

*Prior art*

Snowflake context functions :
     https://docs.snowflake.com/en/sql-reference/functions-context
<https://docs.snowflake.com/en/sql-reference/functions-context>
Databricks Unity Catalog offers a similar mechanism called *dynamic

views*:

https://docs.databricks.com/aws/en/views/dynamic

*Next steps*

If the community is interested, we can discuss API surface, engine
implications, and a roadmap for merging.

Eager to hear your feedback!

Best,
Prashant Singh

--
Robert Stupp
@snazy

--
Robert Stupp
@snazy

--
Robert Stupp
@snazy

--
Robert Stupp
@snazy

Re: Context-Aware Functions for Apache Polaris

Reply via email to