Re: Context-Aware Functions for Apache Polaris

Robert Stupp Tue, 20 May 2025 09:07:35 -0700

This proposal _does_ change the view definition - it returns a_different_ representation than the one that has been stored before.This is a change that breaks the contract of the specification and itchanges the observed behavior.

FGAC is a very complex topic. The right way would be to have a holisticdesign and agreed-on approach. That does take time.


On 20.05.25 16:47, Eric Maynard wrote:

I wouldn’t say that Polaris is changing a view definition, but per my
understanding Polaris is actually generating a view based on a Policy.

We will need a way for Polaris to embed some information into these views.
I don’t think this is a P0 to make FGAC work, and I don’t think this
necessarily needs to take the form of a SQL function. For example, it could
be through a policy like:

{
   “allow_columns”: [“a”, “b”],
   “transform_columns”: {
     {
       “col”: “c”,
       “predicate”: “some_func(x)”},
     {
       “col”: “d”,
       “predicate: “${current_user} == admin”
   }
}

Perhaps we can try to get the first iteration of FGAC (with only field
“allow_columns”) out first. Then, we can implement the engine-driven
predicates (like that on column “c”). Finally, we can examine options for
catalog-driven predicates like the one on column “d”.

—EM

On Tue, May 20, 2025 at 9:44 AM Robert Stupp <[email protected]> wrote:

I don't think that Polaris should change any view definition in any way,
but this is what the proposed approach does.

The approach breaks the contract (behavior defined by the specification)
and in turn the observed behavior, absolute no go's.

Practically speaking, the approach is blindly changing some string
without any knowledge about the actual meaning. But it has to know
exactly what it's doing - and to do that it has to know all the SQL
dialects.

As a side note: every query engine already has information about the
user. I'm definitely not supporting exposing any authZ related information.

FGAC as a feature is a great thing to have. But the proposed approach is
not the right way.


On 19.05.25 20:33, Prashant Singh wrote:

Hey Robert,

Thank you for your honest feedbacks, please let me try answering your
concerns :

There are tons of SQL dialects out there, each requires its own fully

implemented lexer/parser/interpreter
That's true and we are not interpreting it either, we are just replacing
the sql text wherever there is `is_principal('<principal_name>')` with

the

value of TRUE and FALSE from the server end
we are not re-interpreting or parsing the tree, i am assuming this is

what

Analyzer already does in constant folding and boolean simplification, but
yes post parsing, but IMHO i don't think it's an impossible thing to
achieve. If it helps i am even fine is wrapping this as
`{{is_principal('<principal_name>')}}` to make this very specific and let
only Polaris work, IMHO we can work out the rough edges with the
replacement.

for view containing view

This should not be a problem, as we just replace the text of the current
view definition when it comes to resolve the nested view it will issue

the

same call of LOAD view but with the nested view identifier, when it will

be

the call of nested view and that's when i we will do the replace
we don't open the nested view in the definition during the loadView of

the

parent, if that's the concern here, the nested view is treated equivalent
to any other identifier which is opened / interpreted at later state of
execution.

Exposing authZ information via any kind of publicly accessible API to

every user sounds like an interesting source of information - especially
for the "not so good and nice guys".

Yes that's true and that's my intention it's just how we are delivering

the

info, i.e i expose it by view definition itself (or by any other entity
stored in Polaris) , but exposing this as an API would require engine

side

integration too, which we as catalog have a very less control over as a
catalog.

What's the benefit over having the ACLs on the table/view defined in the

intended way?

It's more from feature parity perspective and giving more control on view
rather than just ACL (which are conjunctions) for ex if we just

complicate

the view def with more predicated for ex disjunction

select * from ns1.layer1_table where (condition1) OR
(is_principal_role('ANALYST'))

I would love to get your further feedback, considering the above.


Best,
Prashant Singh




On Mon, May 19, 2025 at 11:04 AM Robert Stupp <[email protected]> wrote:

I'm brutally honest here:

I think we should really stay away from interpreting SQL or any other
kind of (view) definition in Polaris. There are tons of SQL dialects out
there, each requires its own fully implemented lexer/parser/interpreter
- plus views-in-views-in-views-in-views... constructs requiring
resolution of nested views. It eventually ends in implementing
yet-another-query-engine. I doubt that this is doable with a
"java.lang.String.replace(from, to)" approach.

Exposing authZ information via any kind of publicly accessible API to
every user sounds like an interesting source of information - especially
for the "not so good and nice guys".

Regarding the examples: what's the benefit over having the the ACLs on
the table/view defined in the intended way?

On 19.05.25 19:26, Prashant Singh wrote:

Hi everyone,

I’d like to propose adding *context-aware functions* to Apache Polaris

so

that view definitions can resolve security context on the Polaris side

(aka

catalog end without depending on engines).

*Proposed functions*

      1.

      *is_principal('<principal_name>')* – returns TRUE if the

authenticated

      principal matches <principal_name>, otherwise FALSE.
      2.

      *is_principal_role('<principal_role_name>')* – returns TRUE when
      <principal_role_name> appears in the principal’s role set.
      3.

      *is_catalog_role('<catalog_role_name>')* – analogous check at the
      catalog-role level.

*Why it matters*

These predicates make views dynamic. Example:

CREATE VIEW dynamic_vw ASSELECT *FROM ns1.layer1_tableWHERE
is_principal_role('ANALYST');

When a user whose one of principal roles include *ANALYST* calls LOAD
VIEW, Polaris rewrites the view to


      -

      SELECT * FROM ns1.layer1_table WHERE TRUE;


For everyone else the view becomes

      -

      SELECT * FROM ns1.layer1_table WHERE FALSE;


The result is better and consistent control of the identity resolution
without relying on the engine side changes and giving polaris more
authority in enforcing things like FGAC (WIP by me).
Note the same can be extrapolated to any Polaris stored entity.

*Proof of concept*

I’ve put together a quick POC branch:

https://github.com/apache/polaris/compare/main...singhpk234:polaris:dyanmic/view

*Prior art*

Snowflake context functions :
    https://docs.snowflake.com/en/sql-reference/functions-context
<https://docs.snowflake.com/en/sql-reference/functions-context>
Databricks Unity Catalog offers a similar mechanism called *dynamic

views*:

https://docs.databricks.com/aws/en/views/dynamic

*Next steps*

If the community is interested, we can discuss API surface, engine
implications, and a roadmap for merging.

Eager to hear your feedback!

Best,
Prashant Singh

--
Robert Stupp
@snazy

--
Robert Stupp
@snazy

--
Robert Stupp
@snazy

Re: Context-Aware Functions for Apache Polaris

Reply via email to